+ All Categories
Home > Documents > Score gains on g-loaded tests: No g

Score gains on g-loaded tests: No g

Date post: 10-Nov-2023
Category:
Upload: uva
View: 0 times
Download: 0 times
Share this document with a friend
18
Score gains on g-loaded tests: No g Jan te Nijenhuis a, , Annelies E.M. van Vianen a , Henk van der Flier b a Work and Organizational Psychology, University of Amsterdam, Amsterdam, The Netherlands b Work and Organizational Psychology, Free University, Amsterdam, The Netherlands Received 10 December 2005; received in revised form 23 June 2006; accepted 11 July 2006 Available online 14 August 2006 Abstract IQ scores provide the best general predictor of success in education, job training, and work. However, there are many ways in which IQ scores can be increased, for instance by means of retesting or participation in learning potential training programs. What is the nature of these score gains? Jensen [Jensen, A.R. (1998a). The g factor: The science of mental ability . London: Praeger] argued that the effects of cognitive interventions on abilities can be explained in terms of Carroll's three-stratum hierarchical factor model. We tested his hypothesis using testretest data from various Dutch, British, and American IQ test batteries combined into a meta-analysis and learning potential data from South Africa using Raven's Progressive Matrices. The meta-analysis of 64 testretest studies using IQ batteries (total N = 26,990) yielded a correlation between g loadings and score gains of 1.00, meaning there is no g saturation in score gains. The learning potential study showed that: (1) the correlation between score gains and the g loadedness of item scores is .39, (2) the g loadedness of item scores decreases after a mediated intervention training, and (3) low- g participants increased their scores more than high-g participants. So, our results support Jensen's hypothesis. The generalizability of test scores resides predominantly in the g component, while the test-specific ability component and the narrow ability component are virtually non-generalizable. As the score gains are not related to g, the generalizable g component decreases and, as it is not unlikely that the training itself is not g-loaded, it is easy to understand why the score gains did not generalize to scores on other cognitive tests and to g-loaded external criteria. © 2006 Elsevier Inc. All rights reserved. Keywords: g; IQ testing; g loading; Training; Coaching; Learning potential; Dynamic testing; South Africa; Testretest; Score gains 1. Training and score gains Scores on cognitive tests are the best general predictors of accomplishments in school and in the workplace, and it is predominantly the g component of the IQ tests that is responsible for this criterion-related validity (Ree & Earles, 1991; Ree, Earles, & Teachout, 1994; Thorndike, 1985). At the same time, IQ test scores can be increased by various forms of training. Kulik, Bangert-Drowns, and Kulik's (1984) meta-analysis on test preparation studies resulted in effect sizes on intelligence tests for practice and additional coaching of 0.25 S.D. and 0.51 S.D., respectively. Dynamic testing (Grigorenko & Sternberg, 1998) focuses on what children learn in a special training in an attempt to go beyond IQ scores. A general finding is that scores go up by 0.5 to 0.7 S.D. after a dynamic training (Swanson & Lussier, 2001). Ericsson and Lehmann (1996) report immense score increases after intensive training, for Intelligence 35 (2007) 283 300 Jan te Nijenhuis and Annelies van Vianen, Work and Organiza- tional Psychology, University of Amsterdam, Amsterdam, the Nether- lands, and Henk van der Flier, Work and Organizational Psychology, Free University, Amsterdam, the Netherlands. Corresponding author. Gouden Leeuw 746, 1103 KR Amsterdam, The Netherlands. E-mail address: [email protected] (J. te Nijenhuis). 0160-2896/$ - see front matter © 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.intell.2006.07.006
Transcript

07) 283ndash300

Intelligence 35 (20

Score gains on g-loaded tests No g

Jan te Nijenhuis a Annelies EM van Vianen a Henk van der Flier b

a Work and Organizational Psychology University of Amsterdam Amsterdam The Netherlandsb Work and Organizational Psychology Free University Amsterdam The Netherlands

Received 10 December 2005 received in revised form 23 June 2006 accepted 11 July 2006Available online 14 August 2006

Abstract

IQ scores provide the best general predictor of success in education job training and work However there are many ways inwhich IQ scores can be increased for instance by means of retesting or participation in learning potential training programs Whatis the nature of these score gains Jensen [Jensen AR (1998a) The g factor The science of mental ability London Praeger]argued that the effects of cognitive interventions on abilities can be explained in terms of Carrolls three-stratum hierarchical factormodel We tested his hypothesis using testndashretest data from various Dutch British and American IQ test batteries combined into ameta-analysis and learning potential data from South Africa using Ravens Progressive Matrices The meta-analysis of 64 testndashretest studies using IQ batteries (total N=26990) yielded a correlation between g loadings and score gains of minus100 meaning thereis no g saturation in score gains The learning potential study showed that (1) the correlation between score gains and the gloadedness of item scores is minus 39 (2) the g loadedness of item scores decreases after a mediated intervention training and (3) low-g participants increased their scores more than high-g participants So our results support Jensens hypothesis The generalizabilityof test scores resides predominantly in the g component while the test-specific ability component and the narrow abilitycomponent are virtually non-generalizable As the score gains are not related to g the generalizable g component decreases and asit is not unlikely that the training itself is not g-loaded it is easy to understand why the score gains did not generalize to scores onother cognitive tests and to g-loaded external criteriacopy 2006 Elsevier Inc All rights reserved

Keywords g IQ testing g loading Training Coaching Learning potential Dynamic testing South Africa Testndashretest Score gains

1 Training and score gains

Scores on cognitive tests are the best generalpredictors of accomplishments in school and in theworkplace and it is predominantly the g component ofthe IQ tests that is responsible for this criterion-related

Jan te Nijenhuis and Annelies van Vianen Work and Organiza-tional Psychology University of Amsterdam Amsterdam the Nether-lands and Henk van der Flier Work and Organizational PsychologyFree University Amsterdam the Netherlands Corresponding author Gouden Leeuw 746 1103 KR Amsterdam

The NetherlandsE-mail address JanteNijenhuisplanetnl (J te Nijenhuis)

0160-2896$ - see front matter copy 2006 Elsevier Inc All rights reserveddoi101016jintell200607006

validity (Ree amp Earles 1991 Ree Earles amp Teachout1994 Thorndike 1985) At the same time IQ test scorescan be increased by various forms of training KulikBangert-Drowns and Kuliks (1984) meta-analysis ontest preparation studies resulted in effect sizes onintelligence tests for practice and additional coachingof 025 SD and 051 SD respectively Dynamictesting (Grigorenko amp Sternberg 1998) focuses on whatchildren learn in a special training in an attempt to gobeyond IQ scores A general finding is that scores go upby 05 to 07 SD after a dynamic training (Swanson ampLussier 2001) Ericsson and Lehmann (1996) reportimmense score increases after intensive training for

284 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

instance on a memory task very similar to the subtestForward Digit Span of the WISC It is clear that IQscores can be increased by training The question is whatinferences can be drawn from these gains Do theyrepresent true increases in mental ability or simply inperformance on a particular test instrument

2 Jensens hypothesis score gains can besummarized in the hierarchical intelligence model

Jensen (1998a ch 10) hypothesized that the effectsof training on abilities can be summarized in terms ofCarrolls (1993) three-stratum hierarchical factor modelof cognitive abilities At the highest level of thehierarchy (stratum III) is general intelligence or g Onelevel lower (stratum II) are the broad abilities FluidIntelligence Crystallized Intelligence General Memoryand Learning Broad Visual Perception Broad AuditoryPerception Broad Retrieval Ability and Broad Cogni-tive Speediness or General Psychomotor Speed Onelevel lower still (stratum I) are the narrow abilities suchas Sequential Reasoning Quantitative ReasoningVerbal Abilities Memory Span Visualization andPerceptual Speed At the lowest level of the hierarchyare large numbers of specific tests and subtests Sometests despite seemingly very different formats havebeen demonstrated empirically to cluster into onenarrow ability (Carroll 1993)

It is hypothesized that a training effect is most clearlymanifested at the lowest level of the hierarchy ofintelligence namely on specific tests that most resemblethe trained skills One hierarchical level higher thetraining effect is still evident for certain narrow abilitiesdepending on the nature of the training However thegain virtually disappears at the level of broad abilitiesand is altogether undetectable at the highest level gThis implies that the transfer of training effects isstrongly limited to tests or tasks that are all dominatedby one particular narrow skill or ability There isvirtually no transfer across tasks dominated by differentnarrow abilities and it disappears completely beforereaching the level of g Thus there is an increase innarrow abilities or test-specific ability that is indepen-dent of g Test-specific ability is defined as that part of agiven tests true-score variance that is not common toany other test ie it lacks the power to predictperformance on any other tasks except those that arehighly similar Gains on test specificities are thereforenot generalizable but lsquoemptyrsquo or lsquohollowrsquo Only the gcomponent is highly generalizable Jensen (1998a ch10) gives various examples of empty score gainsincluding a detailed analysis of the Milwaukee project

claiming IQ scores rose but not g scores Anotherexample of empty score gains is given by ChristianBachnan and Morrison (2001) who state that increasesdue to schooling show very little transfer acrossdomains

It is hypothesized that the g loadings of the few teststhat are most similar to the trained skills and thereforemost likely to reflect the specific training diminish aftertraining That is after training these particular testsreflect the effect of the specific training rather than thegeneral ability factor

It is one of the most firmly established facts in thesocial sciences that IQ tests have a high degree ofpredictive validity for educational criteria (Jensen 1980Schmidt amp Hunter 1998) meaning that high-g personslearn virtually always more than low-g persons Forinstance Kulik Kulik et als (1984) meta-analysisreported practice effects on intelligence tests of 080 SD 040 SD and 017 SD for subjects of high middleand low ability respectively In industrial psychologythe more complex the training or job the higher thecorrelation of performance with g (Schmidt amp Hunter1998) This means that training or job situations andalso educational settings vary in the degree to whichthey are g-loaded (Gottfredson 1997 2002) HoweverAckerman (1987) cites several classical studies on theacquisition of simple skills through often repeatedexercise where low-g persons made the most progressThese findings could be interpreted as an indication thatthis specific skill acquisition process is not g-loaded Itmay be that some of the various forms of trainingreferred to above also show the largest gains for low-gpersons

There are many ways to test Jensens hypothesisBelow we address (1) studies on repeated testing and gloadedness (2) studies on practice and coaching and (3)studies on learning potential The practice studies used apretestndashposttest design where both the coaching andlearning potential studies used a pretestndashinterventionndashposttest design

3 First test of Jensens hypothesis studies onrepeated testing and g loadedness

What do we find after repeated test taking In aclassic study by Fleishman and Hempel (1955) assubjects were repeatedly given the same psychomotortests the g loading of the tests gradually decreased andeach tasks specificity increased Neubauer and Freu-denthaler (1994) showed that after 9 h of practice the gloading of a modestly complex intelligence test droppedfrom 46 to 39 Te Nijenhuis Voskuijl and Schijve

285J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

(2001) showed that after various forms of test prepara-tion the g loadedness of their test battery decreased from53 to 49 Based on the work of Ackerman (19861987) it can be concluded that through practice oncognitive tasks part of the performance becomesoverlearned and automatic the performance requiresless controlled processing of information which isreflected in lowered g loadings

4 Second test of Jensens hypothesis studies onpractice and coaching

Three studies on practice and coaching have shownincreases in test scores that are not related to the g factorThis suggests that the gains are lsquoemptyrsquo or lsquohollowrsquo Inthe first study Jensen (1998a ch 10) analyzed the effectof practice on the General Aptitude Test Battery(GATB) He found negative correlations ranging fromminus 11 to minus 86 between effect sizes on practice and thetests g loadings Therefore the gains were largest onthe least cognitively complex tests In the second studyte Nijenhuis et al (2001) found a small correlation ofminus 08 for test practice and large negative correlations ofminus 87 for both of their two test coaching conditionsJensen carried out a factor analysis of the various GATBscore gains and found two large factors that did notcorrelate with the g factor extracted from the GATBMost likely the score gains are not on the g factor or thebroad abilities but on the test specificities since teNijenhuis et al showed that practice and coachingreduce the g-loadedness of their tests In a third study(Coyle 2006) factor analysis demonstrated that thechange in aptitude test scores had a zero loading on theg factor

So the studies on practice and coaching appear tosupport the theory However since there are only a fewempirical studies that have tested the link (or absencethereof) between gains in test score from practice andcoaching and g loadings replications are required beforethe conclusion can be firmly established Therefore wecombined several such studies with various DutchBritish and American test batteries into a meta-analysis

5 Third test of Jensens hypothesis studies onlearning potential

Jensen hypothesizes that the effects of training arenot on g but that the gains are empty and trainingshould therefore not lead to increased predictivevalidity Based on learning potential theory one wouldcome to an opposite prediction namely that trainingleads to higher predictive validity The fact that the

theoretical framework of learning potential does notinclude the g factor is of no importance here we solelyfocus on a prediction based on learning potential theorythat is opposite to a prediction based on Jensens theorybased on a hierarchical intelligence model Somelearning potential training studies report predictivevalidities of pre- and posttest scores Based on Jensenstheory one would predict (1) no higher predictivevalidity for learning potential tests in comparison withclassical cognitive tests and (2) no increase in pre-dictive validity due to training when using posttestscores instead of pretest scores However based onlearning potential theory one would predict a sub-stantial increase in predictive validity in both cases Sostudies on learning potential constitute a test ofJensens hypothesis

A large number of studies have been carried out tocheck for learning potential beyond IQ scores generallyshowing that scores go up substantially after mediationApart from theoretical considerations dynamic testsshould show higher criterion-related validities thanclassical IQ tests to justify the time-consuming proce-dure Based on a lengthy review of most of the literatureGrigorenko and Sternberg (1998) concluded that theempirical data do not consistently show higher pre-dictive power of dynamic tests compared with tradi-tional tests Murphy (2002) did an excellent and detailedreview of all South African studies on learning potentialincluding virtually all missed by Grigorenko andSternberg (probably due to difficulty of access) Manystudies (Boeyens 1989 de Villiers 1999 Engelbrecht1999 Gaydon 1988 Haeck Yeld Conradie Robert-son amp Shall 1997 Lipson 1992 Nel 1997 Shochet1986 Skuy et al 2002 Yeld amp Haeck 1997 Zaaiman1998 Zaaiman van der Flier amp Thijs 2001 Zolezzi1992 1995) used data from the numerous South Africanuniversity entrance programs that had adopted adynamic framework for assessing disadvantaged under-prepared students The aim of these programs was togive underprepared applicants an optimal chance toprove that they have the ability to succeed with furtherstudy Again it was found that while some South Africanstudies show higher criterion-related validities forlearning potential tests the effect was not consistent

However in these studies the learning potential testswere compared against individual tests or anunweighted combination of a limited number of testsbut generally not against a full test battery and in no caseagainst g scores g scores have been shown to yieldhigher predictive validities than individual tests or anunweighted score sum (Ree amp Earles 1991 Ree et al1994 Thorndike 1985) So these were comparisons

286 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

where the cognitive predictor with the highest predictivevalidity was not used but where the dynamic tests werepitted against predictors with substantially lower pre-dictive validities than g As no direct comparisons weremade between learning potential tests and g it is notpossible to draw the conclusion that g had higherpredictive validity However since a comparison of alearning potential test with one test or a combination of alimited number of tests generally results in comparablepredictive validities and g scores clearly have higherpredictive validities than one test or a combination of alimited number of tests it not unlikely that a g score willhave a higher predictive validity than a learningpotential test score This also suggests that the findingsmight best be interpreted as tentative support forJensens theory

So the studies on learning potential appear to supportthe theory that score gains can be summarized in thehierarchical intelligence model However more directtests of the theory are required and therefore a learningpotential study was reanalyzed

6 Research questions

The research question of this study is whether scoregains from testndashretest studies and mediated interven-tions can be summarized in terms of Carrolls three-stratum hierarchical intelligence model We examinedwhether (1) correlations between score gains and the gloadedness of the scores are negative in sign (2) the gloadedness of scores decreases after mediation and (3)low-g persons show the largest gains after the mediationtraining We carried out a meta-analysis to be able toprovide a convincing answer to the first researchquestion In a more explorative study on learningpotential in South Africa we tried to find support for allthree research questions

7 Testndashretest studies

To test whether there is a negative correlationbetween g loading of tests and score gains we carriedout a meta-analysis of all testndashretest studies of DutchBritish and American test batteries available in theNetherlands All studies were simple practice studiesndashno intervention such as additional coaching took placendashand used well-validated tests

8 Method

Psychometric meta-analysis (Hunter amp Schmidt1990) aims to estimate what the results of studies

would have been if all studies had been conductedwithout methodological limitations or flaws The resultsof perfectly conducted studies would allow a lessobstructed view of the underlying construct-levelrelationships (Schmidt amp Hunter 1999) One of thegoals of the present meta-analysis is to have a reliableestimate of the true correlation between standardizedtestndashretest score gains (d) and g Although the constructof g has been thoroughly studied the construct under-lying score gains is less well understood One of theaims of the present study is to have a clearerunderstanding of the construct underlying score gainsby linking it to the g nexus Carrying out a completemeta-analysis on the relationship between d and gwould require the collection of a very large number ofdatasets However applying meta-analytical techniquesto a sufficiently large number of studies will also lead toa reliable estimate of the true correlation between d andg We therefore collected a large number of studiesheterogeneous across various possible moderators

To get a reliable correlation between g and d wefocused on batteries with a minimum of seven subtestsLibraries and test libraries of universities were searchedand several members of the Dutch Testing Commissionand test publishers were contacted We limited ourselvesto non-clinical samples without health problems Only aminority of test manuals report testndashretest studiesespecially before 1970 they are rare The search yieldedvirtually all testndashretest studies available in the Nether-lands The GATB manual (1970 ch 20) reports verylarge datasets on secondary school children who tookthe GATB with respectively 1- 2- and 3-year intervalsAt the time of the first test large samples of children thathad the same age as the testndashretest children at the time ofthe second test also took the test Through a comparisonof the scores the maturation effects could be separatedfrom the testndashretest effects so we included the data inthe present study

Standardized score gains were computed by dividingthe raw score gain by the SD of the pretest In generalg loadings were computed by submitting a correlationmatrix to a principal axis factor analysis and using theloadings of the subtests on the first unrotated factor Insome cases g loadings were taken from studies whereother procedures were followed these procedures havebeen shown empirically to lead to highly comparableresults Pearson correlations between the standardizedscore gains and the g loadings were computed

Psychometric meta-analytical techniques (Hunter ampSchmidt 1990 2004) were applied to the resulting 64rgds using the software package developed by Schmidtand Le (2004) Psychometric meta-analysis is based on

287J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

the principle that there are artifacts in every dataset andthat most of these artifacts can be corrected In thepresent study we corrected for five artifacts that alter thevalue of outcome measures listed by Hunter andSchmidt (1990) (1) sampling error (2) reliability ofthe vector of g loadings (3) reliability of the vector ofscore gains (4) restriction of range of g loadings and(5) deviation from perfect construct validity

81 Correction for sampling error

In many cases sampling error explains the majorityof the variation between studies so the first step in apsychometric meta-analysis is to correct the collectionof effect sizes for differences in sample size between thestudies

82 Correction for reliability of the vector of g loadings

The value of rgd is attenuated by the reliability of thevector of g loadings for a given battery When twosamples have a comparable N the average correlationbetween vectors is an estimate of the reliability of eachvector The collection of datasets in the present studyincluded no g vectors for the same battery from differentsamples and therefore artifact distributions were basedupon other studies reporting g vectors for two or moresamples So the effect sizes and the distribution ofreliabilities of the g vector were based upon differentsamples When two g vectors were compared thecorrelation between them was used and when morethan two g vectors were compared the averagecorrelation for the various combinations of two vectorswas used The combined N from the samples on whichthe g vector was based was taken as the weight of onedata point

Several samples were compared that differed little onbackground variables For the comparisons usingchildren we chose samples that were highly comparablewith regard to age and for the comparisons of adults wechose samples that were roughly comparable withregard to age In a study on young children Schrootsand van Alphen de Veer (1979) report correlationmatrices for the Leidse Diagnostische Test for eight agegroups between 4 and 8 years of age The averagecorrelation between the adjacent age groups is 75(combined N=1169) Several studies report data onboth younger and older children The DutchFlemishWISC-R (van Haasen et al 1986) has samples withcomparable N of Dutch and Flemish children so the 11age groups between 6 and 16 could be compared Thisresulted in an average correlation of 78 (combined

N=3018) Jensen (1985) reports g loadings of the 12subtests of the WISC-R obtained in three largeindependent representative samples of Black andWhite children The average correlation between the gvectors obtained for each sample is 86 for the Blackchildren (combined N=1238) and 93 for the Whitechildren (combined N=2868) In a study on olderchildren Evers and Lucassen (1991) report the correla-tion matrices of the Dutch DAT The average correlationbetween the g vectors of three educational groups is 88(combined N=3300) The US GATB manual (1970chapter 20) gives correlation matrices for large groupsof boys and girls in secondary school The averagecorrelation between the g vectors of the same-age boysand girls is 97 (combined N=26708) Several studiesreport data on adults g loadings of the eight subtests ofthe GATB are reported by te Nijenhuis and van der Flier(1997) for applicants at Dutch Railways and by de Wolffand Buiten (1963) for seamen at the Royal Dutch Navyresulting in a correlation of 90 (combined N=1306)The US GATB manual (1970) gives correlation matricesfor two large groups of adults which yields a correlationbetween g vectors of 94 (combined N=4519) JohnsonBouchard Krueger McGue and Gottesman (2004)report g loadings for a sample that took the WAIS andWechsler (1955) reports the correlation matrices of theWAIS for adults of comparable age so g loadings couldbe computed The correlation between the g vectors forthe two studies is 72 (combined N=736) So it appearsthat g vectors are quite reliable especially when thesamples are very large

The number of tests in the batteries in the presentstudy varied from 7 to 14 The number of tests does notnecessarily influence the size of rgd but clearly has aneffect upon its variability Because variability in thevalues of the artifacts influences the amount of varianceartifacts explain in observed effect sizes we estimatedthis variability using data from the samples described inthe previous paragraph

83 Correction for reliability of the vector of scoregains

The value of rgd is attenuated by the reliability ofthe vector of score gains for a given battery Whentwo samples have a comparable N the average cor-relation between vectors is an estimate of thereliability of each vector The reliability of the vectorof score gains was estimated using the presentdatasets comparing samples that took the same testand that differed little on background variables Forthe comparisons using children we choose samples

288 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

that were highly comparable with regard to age andfor the comparisons of adults we choose samples thatwere roughly comparable with regard to age

In the GATB manual (1970 ch 15) 13 combinationsof two studies are described where large samples of menand women that are comparable with respect to age andbackground took the same GATB subtests The averageunweighted correlation between the d vectors of menand women is 83 (total N=3760) In the GATB manual(1970 ch 20) three combinations of three studies aredescribed where very large samples of boys and girlsthat are in the same grade in secondary school took thesame GATB subtests This yielded correlations betweenthe d vectors of respectively 99 98 and 94 (totalN=20541) Together van Geffen (1972) and Bosch(1973) report three Dutch GATB testndashretest studies onchildren in secondary school resulting in threecomparisons between d vectors The average N-weighted correlation between the d vectors is 47 (totalN=127) Vectors of score gains from two differentdatasets on the WISC-R were compared Tuma andAppelbaum (1980) tested children with an average ageof 10 and Wechsler (1974) tested 10- and 11-year-oldsThe correlation between the two d vectors is 71 (totalN=147) Comparison of vectors of score gains fromdatasets on the DAT (Bennett Seashore amp Wesman1974) resulted in correlations of respectively 78 and73 so an average r of 76 (total N=254) So it appearsthat d vectors are quite reliable especially when thesamples are very large We estimated the reliabilities ofthe d vectors in the database using data from thesamples described in this paragraph

84 Correction for restriction of range of g loadings

The value of rgd is attenuated by the restriction ofrange of g loadings in many of the standard testbatteries The most highly g-loaded batteries tend tohave the smallest range of variation in the subtests gloadings Jensen (1998a pp 381ndash382) shows thatrestriction in g loadedness strongly attenuates thecorrelation between g loadings and standardized groupdifferences Hunter and Schmidt (1990 pp 47ndash49)state that the solution to range variation is to define areference population and express all correlations interms of that reference population The Hunter andSchmidt meta-analytical program computes what thecorrelation in a given population would be if thestandard deviation were the same as in the referencepopulation The standard deviations can be comparedby dividing the study population standard deviation bythe reference group population standard deviation that

is u=SDstudySDref As the reference we took thetests that are broadly regarded as exemplary for themeasurement of the intelligence domain namely thevarious versions of the Wechsler tests for childrenThe average standard deviation of g loadings of thevarious Dutch and US versions of the WISC-R andthe WISC-III was 0128 So the SD of g loadings ofall test batteries was compared to the average SD ing loadings in the Wechsler tests for children Thisresulted in some batteriesndashsuch as the GATBndashhavinga value of u larger than 100

85 Correction for deviation from perfect constructvalidity

The deviation from perfect construct validity in gattenuates the value of rgd In making up any collectionof cognitive tests we do not have a perfectly repre-sentative sample of the entire universe of all possiblecognitive tests So any one limited sample of tests willnot yield exactly the same g as any other limitedsample The sample values of g are affected by psy-chometric sampling error but the fact that g is verysubstantially correlated across different test batteriesimplies that the differing obtained values of g can allbe interpreted as estimates of a ldquotruerdquo g The value ofrgd is attenuated by psychometric sampling error ineach of the batteries from which a g factor has beenextracted

The more tests and the higher their g loadings thehigher the g saturation of the composite score TheWechsler tests have a large number of subtests withquite high g loadings resulting in a highly g-saturatedcomposite score Jensen (1998a pp 90ndash91) states thatthe g score of the Wechsler tests correlate more than 95with the tests IQ score However shorter batteries witha substantial number of tests with lower g loadings willlead to a composite with a somewhat lower g saturationJensen (1998a ch 10) states that the average g loadingof an IQ score as measured by various standard IQ testsis in the +80 s When we take this value as an indicationof the degree to which an IQ score is a reflection ofldquotruerdquo g we can estimate that a tests g score correlatesabout 85 with ldquotruerdquo g As g loadings are thecorrelations of tests with the g score it is most likelythat most empirical g loadings will underestimate ldquotruerdquog loadings so empirical g loadings correlate about 85with ldquotruerdquo g loadings As the Schmidt and Le computerprogram only includes corrections for the first fourartifacts the correction for deviation from perfectconstruct validity was carried out on the value of rgdafter correction for the first four artifacts To limit the

Table 1Dutch British and US studies of correlations between g loadings and gain scores

Reference Test r N Information

Drenth et al (1968) AKIT minus 57 100 Primary-school childrenvan Geffen (1972) GATB minus 45 42 Secondary-school children

minus 21 42Bosch (1973) GATB minus 07 43 Secondary-school childrenSchroots and van Alphen

de Veer (1979)LDT minus 42 96 Pre-school and secondary-school children

Bleichrodt et al (1987) RAKIT 09 49 Pre-school childrenminus 25 51 Primary-school childrenminus 21 49 Primary-school children

van der Doef et al (1989) WISC-R minus 69 22 Primary-school children with learning problemsMulder et al (2004) KAIT minus 23 46 Secondary-school children+young adults

minus 42 25 AdultsKort et al (2005) WISC-III minus 15 42 Primary-school children

minus 26 67 Primary-school childrenminus 46 39 Secondary-school children

Luteijn and Barelds (2005) GIT2 minus 51 44 AdultsKooij et al (2005) WAIS-III minus 63 60 AdultsElliott (1983) BAS minus 65 60 Primary-school childrenWechsler (1967) WPPSI minus 46 50 Pre-school childrenUnited States Department

of Labor (1970)GATB minus 35 156 Office applicants

minus 66 605 Male high school seniorsminus 70 554 Female high school seniorsminus 58 223 Males 1-day intervalminus 41 186 Females 1-day intervalminus 50 202 Males 2-week intervalminus 52 152 Females 2-week intervalminus 67 156 Males 6-week intervalminus 61 168 Females 6-week intervalminus 43 176 Males 13-week interval02 149 Females 13-week interval

minus 62 157 Males 26-week intervalminus 32 136 Females 26-week intervalminus 69 119 Males 1-year intervalminus 31 183 Females 1-year intervalminus 96 118 Males 2-year intervalminus 75 170 Females 2-year intervalminus 75 123 Males 3-year intervalminus 48 183 Females 3-year intervalminus 92 3398 Boys secondary schoolminus 92 3680 Girls secondary schoolminus 91 3348 Boys secondary schoolminus 91 3491 Girls secondary schoolminus 84 3229 Boys secondary schoolminus 87 3395 Girls secondary school

Wechsler (1974) WISC-R minus 48 97 Primary-school childrenminus 66 102 Primary-school childrenminus 21 104 Secondary-school children

Bennett et al (1974) DAT minus 79 92 Boys secondary schoolminus 53 81 Girls secondary schoolminus 29 81 Boys secondary schoolminus 62 100 Girls secondary school

Covin (1977) WISC-R minus 57 30 Primary-school children with learning problemsTuma and Appelbaum (1980) WISC-R minus 08 45 Primary- and secondary-school childrenMatarazzo et al (1980) WAIS minus 10 29 Young malesWechsler (1981) WAIS-R minus 64 71 Adults

minus 48 48 Adults

(continued on next page)

289J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Table 1 (continued)

Reference Test r N Information

McCormick et al (1983) ASVAB minus 73 57 adultsKaufman and Kaufman (1983) K-ABC minus 27 46 Pre-school children

minus 18 36 Pre- and primary-school childrenminus 22 70 Primary-school children

Wechsler (1997) WAIS-III minus 45 100 Young adultsminus 57 102 Adultsminus 51 104 Adults03 88 Adults

Reeve and Lam (2005) EAS minus 34 123 Undergraduate students

In general the g loadings were based on the correlation matrix taken from the manuals containing the testndashretest studies or from the correlation matrixbased on the largest sample size we could find What follows is a list of the sources of the g loading when not taken from the manuals containing thetestndashretest studyvan Geffen (1972) and Bosch (1973) de Wolff and Buiten (1963) see also Johnson te Nijenhuis and Bouchard (in press) Bleichrodt et al (1987) teNijenhuis et al (2004) who used the same data on which the RAKIT manual is based van der Doef Kwint and van der Koppel (1989) DutchWISC-R manual Elliott (1983) Table 98 Age 90ndash911 years US Dept of Labors GATB (1970) Jensen (1985 p 214) using the largestcorrelation matrix in the GATBmanual Wechsler (1974) Covin (1977) and Tuma and Appelbaum (1980) Jensen (1985 p 214 first study) Bennettet al (1974) average of four highly similar correlation matrices Matarazzo et al (1980) Wechslers (1955 p 17) Table 8 for ages 25ndash34McCormick et al (1983) Ree and Carretta (1994) Reeve and Lam (2005) utilize SEM analyses and use item parcels instead of full scale scores tocompute g loadings The average g loading of all the item parcels for a specific subtest was taken as the g loading of that specific subtest

290 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

risk of overcorrection we conservatively chose thevalue of 90 for the correction

9 Results

The results of the studies on the correlation between gloadings and gain scores are shown in Table 1 The tablegives data derived from 64 studies with participantsnumbering a total of 26990 The table gives thereference for the study the cognitive ability test usedthe correlation between g loadings and gain scores thesample size and background information on the study Itis clear that virtually all correlations are negative and thatthe size of the few positive correlations is very small

Table 2 shows the results of the psychometric meta-analysis of the 64 data points It shows (from left toright) the number of correlation coefficients (K) totalsample size (N) the mean observed correlations (r) andtheir standard deviation (SDr) the true correlations onecan expect once artifactual error from unreliability in theg vector and the d vector and range restriction in the gvector has been removed (ρ) and their standarddeviation (SDρ) The next two columns present thepercentage of variance explained by artifactual errors (

Table 2Meta-analysis results for correlations between g loadings and gain scores af

Studies included K N r SD

All 64 26990 minus 80 20All minus 3 outliers 61 26704 minus 81 18

K=number of correlations N=total sample size r=mean observed correlcorrelation ρ=true correlation (observed correlation corrected for unreliabiliVE=percentage of variance accounted for by artifactual errors 95 CI=9

VE) and the 95 credibility interval (95 CI) Thisinterval denotes the values one can expect for ρ in 19out of 20 cases

The large number of data points and the very largesample size indicate that we can have confidence in theoutcomes of this meta-analysis The estimated truecorrelation has a value of minus 95 and 81 of the variancein the observed correlations is explained by artifactualerrors However Hunter and Schmidt (1990) state thatextreme outliers should be left out of the analysesbecause they are most likely the result of errors in thedata They also argue that strong outliers artificiallyinflate the SD of effect sizes and thereby reduce theamount of variance that artifacts can explain We choseto leave out three outliersndashmore than 4 SD below theaverage r and more than 8 SD below ρndashcomprising1 of the research participants This resulted in nochanges in the value of the true correlation a largedecrease in the SD of ρ with 74 and a large increasein the amount of variance explained in the observedcorrelations by artifacts by 22 So when the threeoutliers are excluded artifacts explain virtually all of thevariance in the observed correlations Finally a correc-tion for deviation from perfect construct validity in g

ter corrections for reliability and restriction of range

r ρ SDρ VE 95 CI

minus 95 11 81 minus074 to 116minus 95 03 99 minus091 to 100

ation (sample size weighted) SDr=standard deviation of observedty and range restriction) SDρ=standard deviation of true correlation5 credibility interval

291J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

took place using a conservative value of 90 Thisresulted in a value of minus106 for the final estimated truecorrelation between g loadings and score gainsApplying several corrections in a meta-analysis maylead to correlations that are larger than 100 or minus100 asis the case here Percentages of variance accounted forby artifacts larger than 100 are also not uncommon inpsychometric meta-analysis They also do occur in othermethods of statistical estimation (see Hunter amp Schmidt1990 pp 411ndash414 for a discussion)

10 Discussion

A large-scale meta-analysis of 64 testndashretest studiesshows that after corrections for several artifacts there isan estimated true correlation of minus106 between gloading of tests and score gains and virtually all of thevariance in observed correlations is attributable to theseartifacts As several artifacts explain virtually all thevariance in the effect sizes other dimensions on whichthe studies differ such as age of the test takers testndashretest interval test used average-IQ samples or sampleswith learning problems play no role at all

The estimated true correlation of minus106 is the resultof various corrections for artifacts that attenuate thecorrelations The estimated values of the artifacts mayunderestimate or overestimate the population values ofthe artifacts Therefore estimates of true effect sizesmay overestimate or underestimate the populationvalues of the effect size As a solution to this problemHunter and Schmidt (2004) suggest carrying out severalmeta-analyses on the same construct and taking theaverage estimated effect size of all meta-analyses Thegeneral idea is that meta-analysis is a powerful researchtool but does not give perfect outcomes

A correlation of minus106 falls outside the range ofacceptable values of a correlation but one has to make adistinction between the meta-analytical estimate of thetrue correlation between g and d and the true correlationbetween g and d We interpret the value of minus106 for themeta-analytical estimate as meaning that the truecorrelation between g and d is minus100 A correlation ofminus100 means that there is an inverse relationshipbetween g and score gains So the tests with the highestg loadings show the smallest gains The most straightfor-ward interpretation of this very large negative correlationis that there is no g saturation in testndashretest gain scores

11 The South African learning potential study

In a carefully carried-out study Skuy et al (2002)used a dynamic testing procedure to see whether it

would improve the scores of Black South Africanstudents on Ravens Standard Progressive Matrices(RSPM) The Bantu Education Act of 1954 establisheda discriminatory educational system characterized bypoorly qualified teachers sparsely equipped and fundedschools and generally poor quality Most Black studentsin the sample had not received the same quality ofeducation as White students Black White Indian andColored research participants took the RSPM on twooccasions and in between randomly constitutedexperimental groups were exposed to the MediatedLearning Experience Both the Black South Africangroup and the group consisting of White Indian andColored South Africans improved over their baseline onthe RSPM and the Black group showed greaterimprovement

The value of these cognitive interventions increaseswhen the score gains are transferred to other tests andto external criteria such as school or work achieve-ment Therefore the research participants also tookFeuersteins Representational Stencil Design Test as atransfer measure The subject is presented with astencil of a geometric design and then asked to pointto which stencils need to be used and in whatsequence in order to construct an identical designLike the RSPM the Stencils test also requiresrepresentationalabstract thinking but the training onthe RSPM showed little transfer to it Moreover thecorrelation of the RSPM scores with performance inthe end-of-year psychology examination did notsignificantly improve after mediation Once againthe score gains were empty they did not generalizeSkuy et al go on to ask the question what it is thatwas improved by their interventions Professor Skuymade his data accessible to the present authors so wecould perform additional analyses

12 Sample

The data from Skuy et al (2002) were used with theexception of data from three research participantsbecause their pretest IQ scores were extremely low(more than 3 SDs below the group mean) Ninety-fiveuniversity students in psychology aged 16 to 29 (meanage=20 SD=23 25 males 70 females) participatedin this study They were 66 Black students (20 males 46females) and 29 White (20) Indian (6) and Colored (3)students (5 males 24 females) The mean age of theBlack group was 20 (SD=25) and of the WhiteIndian and Colored group 19 years (SD=1) Subjectswere randomly assigned to the experimental group(n=55) and to the control group (n=40)

292 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

13 Procedure

The students participated in pre- and posttest phaseswith a group intervention in between The study focusedon improvement in scores on the RSPM using the SetVariations II of the Learning Propensity AssessmentDevice as the mediation task Mediation training took3 h and was conducted by three experienced psychol-ogists with the assistance of six postgraduate psychol-ogy students A detailed description is given in Skuy etal (2002)

14 Measures and cognitive intervention

The Ravens Standard Progressive Matrices consistsof 60 items (divided into 5 sets of 12 items) designedto measure the ability to form comparisons to reasonby analogy and to organize spatial information intorelated wholes It has been established as one of thepurest measures of g (Jensen 1998a) Skuy et al(2002) found no evidence for test bias against Blacksin South African education Rushton Skuy and Bons(2004) showed that the Ravens gave comparablepredictive validities for students from various groupsCross-cultural testing research has clearly shown thatunsufficient proficiency in the language of the testcan lead to biased assessments in tests with a strongverbal component However the Ravens is a non-verbal test

The Learning Propensity Assessment Device con-sists of 14 exercises Each exercise contains an initialmediation task Subsequent tasks increase in complex-ity and novelty and aim to assist the learner toachieve mastery over the task The purpose ofmediation is to assist the learner to develop theappropriate cognitive strategies and functions neededfor the successful completion of the task The SetVariations II of the Learning Propensity AssessmentDevice consists of five sets of items which comprisevariations of Sets C D and E of the RSPM test Eachset of variations contains a learning task for thepurpose of initial mediation followed by a series ofprogressively more difficult variations to which theskills learned must be applied Mediation involvesdiscussing with groups how to define the problem tobe solved focus on the task set rules regulateproblem solving behavior and identify the correctsequence of logical steps needed to solve the taskMediation also involves helping the subject todevelop appropriate concepts verbal tools andinsights in relation to the task A detailed descriptionis given in Skuy et al (2002)

15 Statistical analyses

Although the Skuy et al study is among the SouthAfrican learning potential studies with the largestsample size the N is not large We therefore chosebasic statistical analyses

151 Descriptive statistics

Means standard deviations and reliabilities werecomputed for the various groups With regard tomeasures of effect size Hunter and Schmidt (1990 p271) advise choosing estimates of variance with the leasterror Because repeated test takings tend to change thesize of the SD (Ackerman 1987) we chose the SD ofthe pretests for the denominator The correlationbetween scores before and after the training wascomputed to see whether the training had an effect onthe rank order of individuals scores

152 Correlation between score gains and g loadedness

Because our sample was not large and quite specificestimates of g loadedness were taken from Lynn Allikand Irwings (2004) item analysis of RSPM in Estoniausing a large (N=2735) nationally representativesample The same reasoning as in psychometric meta-analysis applies namely that larger samples give betterestimates of g loadings than smaller samples In ahierarchical factor analysis of the items using structuralequations modeling Lynn et al computed g loadings of52 of the 60 items In the present study Pearsoncorrelations were calculated between the g loadings ofthese 52 items and the effect sizes on these items

153 g loadings

The RSPM consists of dichotomous items so wecomputed a correlation matrix of polychoric correlations(Nunnally amp Bernstein 1994) A principal axis factoranalysis was carried out The percentage varianceexplained by the first unrotated factor was taken as anestimate of g loadedness Because sample size waslimited we collapsed the experimental and the controlgroup

154 Correlation between sum scores and score gains

We tested whether individuals with low-g improvedtheir scores more than those with high-g by correlatinggain scores with pretest RSPM scores for each of thefour research groups As gain scores tend to be

Table 3Proportion of sample selecting the correct answer on items of Ravens Standard Progressive Matrices by group

Set A Set B Set C Set D Set E

Item Black Other a Item Black Other Item Black Other Item Black Other Item Black Other

1 100 100 13 100 100 25 100 97 37 100 100 49 74 902 97 100 14 100 100 26 96 100 38 99 100 50 64 903 97 100 15 100 100 27 96 100 39 89 100 51 79 974 100 97 16 91 97 28 86 93 40 92 100 52 56 835 100 100 17 96 97 29 94 97 41 96 100 53 52 836 99 100 18 85 100 30 76 83 42 92 100 54 35 767 94 97 19 77 66 31 88 97 43 77 100 55 42 798 91 93 20 79 97 32 50 79 44 76 93 56 21 699 100 97 21 83 97 33 74 90 45 71 97 57 30 4110 91 97 22 92 100 34 61 79 46 79 93 58 12 4111 83 90 23 80 90 35 53 69 47 29 41 59 02 1712 68 83 24 59 83 36 06 35 48 26 38 60 11 21a Other=White Indian and Colored

293J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

negatively correlated with pretest scores as a function ofunreliability (see Cronbach 1990 Nunnally amp Bern-stein 1994) we corrected the correlations using TuckerDamarin and Messicks (1966) formula 63 Using theformula one adds to each correlation the term (SDpretestSD gain score) (1minus reliability pretest)

16 Results

161 Descriptive statistics

Internal consistencies (Cronbach αs) on the RSPMranged from 76 to 86 for the pre- and posttestsrespectively Table 3 shows the proportion of each of thegroups which selected the correct answer on each of the60 items of the pretest Across the 60 items the order ofthe p values was almost identical for Blacks and WhiteIndianColoreds (r=92 p=00)

Table 4 shows the means and standard deviations forthe total RSPM scores for the four groups along withthe d effect sizes representing the difference betweenpre- and posttest scores (Cohen 1988) First we

Table 4Pre- and posttest mean ravens scores standard deviations and mean effect

Black experimental(n=40)

Black control (n=26)

Pretest Posttest Pretest Postte

Raw scoresM 4378 5010 4546 4835SD 664 531 669 671Percentile 14 41 16 31Effect size 095 043

Percentiles are based on US adult norms see Raven Raven and Courts (2a Other=White Indian and Colored

examined whether there was an effect of race (Blackvs WhiteIndianColored) and group (experimental vscontrol) on the pretest scores There was a significanteffect due to race (F(1 91)=2413 p=00 η2 = 21)but not group (F(1 91)=228 p= 14 η2 = 02) Thismeans that mean pretest scores of Blacks (M=4444 SD=665) were lower than those of WhiteIndianColoreds (M=5141 SD=505) and that mean pretestscores of experimental and control groups werecomparable (M=4553 SD=704 and M=48 SD=67 respectively)

Secondly we investigated the effects of training onthe posttest scores by performing a two-way ANCOVAon the total posttest scores with race and group as factorsand the total pretest scores as the covariate There was asignificant effect for group (F(1 95)=1381 p=00η2 = 13) and for race (F(1 90)=399 p=05 η2 = 04)but not for the two-way interaction of group and race (F(1 90)=028 p= 60 η2 = 00) These results indicatethat the training was equally effective for both the Blackand WhiteIndianColored students Posttest scores ofBlacks (M=4941 SD=591) however remained

sizes for Black and WhiteIndianColored students

Other a experimental(n=15)

Other control (n=14)

st Pretest Posttest Pretest Posttest

5020 5580 5271 5536605 376 345 34341 75 55 68093 077

000) Table SPM13

294 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

significantly lower (F(1 91)=2833 p=00) than thoseof WhitesIndiansColoreds (M=5559 SD=355)Although posttest scores of the experimental group(M=5165 SD=553) were higher than those of thecontrol group (M=508 SD = 665) differencesbetween both groups were nonsignificant (F(1 91)=085 p=36)

The correlation between scores before and after thetraining was 84 (p=00) for the experimental group and90 (p=00) for the control group showing that thetraining had only limited effect on the rank order ofindividuals scores This means that the test strongly butnot perfectly measures the same constructs on bothoccasions

162 Correlation between score gains and g loadedness

We estimated effect sizes for each of the four groups(race by condition) by computing the differencebetween mean pretest scores and posttest scores dividedby the standard deviation of the pretest scores of Blackand WhiteIndianColored students respectivelyFinally we calculated the correlations between effectsizes and the g loadings taken from Lynn et alCorrelations were minus 24 (p=10) for the Black experi-mental group minus 21 (p=20) for the WhiteIndianColored experimental group minus 08 (p=59) for theBlack control group and minus 41 (p=01) for the WhiteIndianColored control group Small sample sizesusually attenuate correlations (Hunter amp Schmidt1990) Collapsing the groups indeed resulted in higheraverage correlations minus 39 for the complete experimen-tal group and minus 26 for the complete control group

163 g loadings

Using the combined experimental and controlgroup a principle axis factor analysis on the pretestand posttest scores respectively resulted in a firstunrotated factor explaining 22 of the variance in thepretest scores and 18 of the variance in the posttestscores These findings suggest that the g loadedness ofthe RSPM decreased substantially after MediatedLearning Experience

164 Correlation between score gains and sum score

Correlating score gains with RSPM total scoresresulted in values of minus 60 (p=00) for the Blackexperimental group minus 18 (p=38) for the Black controlgroup minus 82 (p= 00) for the WhiteIndianColoredexperimental group and minus 48 (p=08) for the White

IndianColored control group After the use of thecorrection formula of Tucker et al (1966) thesecorrelations became minus 39 minus 08 minus 61 and minus 35respectively Overall these correlations show that low-g persons improved their scores more strongly thanhigh-g persons

17 Discussion

Skuy et al (2002) hypothesized that the low-qualityeducation of Blacks in South Africa would lead to anunderestimate of their cognitive abilities by IQ testsGroups of Black and WhiteIndianColored studentstook the Ravens Progressive Matrices twice and inbetween received Feuersteins Mediated LearningExperience The test scores went up substantially in allgroups Evidence for an authentic change in the g factorrequires broad transfer or generalizability across a widevariety of cognitive performance However Skuy et alshow that the gains did not generalize to scores on another highly similar test and to external criteria andwere therefore hollow As the score gains were in somecases quite largendash14 IQ points for the Black experi-mental groupndashthe question becomes what is it thatimproved

The findings show that the correlations betweenscore gains and g loadedness of the items were minus 39 forthe complete experimental group and minus 26 for thecomplete control group However because the gloadings and gain scores are measured at the itemlevel their reliabilities are not high resulting insubstantial attenuation of the correlation between gand d Moreover RSPM does not measure g perfectlyJensen (1998a p 91) estimates its g loading at 83When we estimate the reliability of the g vector at 70and the reliability of the gain score vector at 50corrections for unreliability and deviation from perfectconstruct validity of g only would result in estimatedtrue correlations of respectively minus 80 and minus 53 Thesevalues should be taken as underestimates controllingfor additional artifacts will bring them closer to the verystrong negative correlation found in the meta-analysis

The findings suggest that after training the gloadedness of the test decreased substantially Wefound negative substantial correlations between gainscores and RSPM total scores Table 4 shows that thetotal score variance decreased after training which is inline with low-g subjects increasing more than high-gsubjects Since as a rule high-g individuals profit themost from trainingndashas is reflected in the ubiquitouspositive correlation between IQ scores and trainingperformance (Jensen 1980 Schmidt amp Hunter 1998)ndash

295J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

these findings could be interpreted as an indication thatFeuersteins Mediated Learning Experience is not g-loaded in contrast with regular trainings that are clearlyg-loaded Substantial negative correlations betweengain scores and RSPM total scores are no definite proofof this hypothesis but are in line with it Additionalsubstantiation of our hypothesis that the Feuersteintraining has no or little g loadedness is that Coyle (2006)showed that gain scores loaded virtually zero on the gfactor Moreover Skuy et al reported that the predictivevalidity of their measure did not increase when thesecond Raven score was used The fact that individualswith low-g gained more than those with high-g could beinterpreted as an indication that the Mediated LearningExperience was not g-loaded It should be notedhowever that Feuerstein most likely did not intend hisintervention to be g-loaded He was interested inincreasing the performance of low scorers on bothtests and external criteria

18 General discussion

IQ scores are by far the best general predictor ofsuccess in education job training and work Howeverthere are many ways in which these IQ scores can beincreased for instance by means of retesting orparticipating in a learning potential training programWhat conclusions can be drawn from such score gainsJensens (1998a) hypothesis that the effects of trainingon abilities can be summarized in terms of Carrollsthree-stratum hierarchical factor model was tested in ameta-analysis on testndashretest data using Dutch Britishand American test batteries and with learning potentialdata from South Africa using Ravens ProgressiveMatrices The meta-analysis convincingly shows thattestndashretest score gains are not g-loaded The findingsfrom the learning potential study are clearly in line withthis when the attenuation caused by unreliability andother artifacts is taken into account the correlationbetween g loadings of items and gains on items has avalue that is somewhat comparable to the one found inthe meta-analysis for test batteries The data suggest thatthe g loadedness of item scores decreases after theintervention training Te Nijenhuis et als (2001)finding that practice and coaching reduced the g-loadedness of their test scores strengthens the presentfindings using item scores The findings show that notthe high-g participants increase their scores the mostndashasis common in training situationsndashbut it is the low-gpersons showing the largest increases of their scoresThis suggests that the intervention training is not g-loaded

Our findings fit quite well with the hierarchical modelof intelligence The generalizability of test scores residespredominantly in the g component whereas the test-specific ability component and the narrow abilitycomponent are virtually non-generalizable This is forinstance evidenced by the earlier finding that addingverbal tests to a g score or numerical tests to a g scoreresulted in only a very small incremental validity (Ree ampEarles 1991 Ree et al 1994) Additionally Ericssonand Lehmann (1996) reported immense gains for amemory task focusing on one narrow ability but did notfind any improvement for comparable memory tasksfocusing on another narrow ability As the score gains arenot related to g the generalizable g componentdecreases and since it is not unlikely that the Feuersteintraining itself is not g-loaded it is easy to understand whythe score gains did not generalize to scores on thecognitively loaded Representational Stencil Design TestFor a similar reason the score gains did not generalize tog-loaded external criteria as the correlation of the RSPMscores with performance in the end-of-year psychologyexamination did not significantly improve after media-tion Reeve and Lam (2005) claimed that retesting doesnot change the nature of what is being tested but ourfindings suggest the opposite

19 Limitations of the studies

Our meta-analysis and our analysis of the SouthAfrican study are strongly based on the method ofcorrelated vectors (MCV) and recently it has been shownto have limitations Dolan and Lubke (2001) have shownthat when comparing groups substantial positive vectorcorrelations can still be obtained even when groups differnot only on g but also on factors uncorrelated with gAshton and Lee (2005) show that associations of avariable with non-g sources of variance can produce avector correlation of zero even when the variable isstrongly associated with g They suggest that the gloadings of a subtest are sensitive to the nature of the othersubtest in a battery so that a specific sample of subtestsmay cause a spurious correlation between the vectorsNotwithstanding these limitations studies using MCVcontinue to appear (see for instance Colom Haier ampJung in pressHartmannKruuseampNyborg in press Leeet al 2006) The outcomes of our meta-analysis of a largenumber of studies using the method of correlated vectorsmay make an interesting contribution to the discussion onthe limitations of the method of correlated vectors

A principle of meta-analysis is that the amount ofinformation contained in one individual study is quitemodest Therefore one should carry out an analysis of

296 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

all studies on one topic and correct for artifacts leadingto a strong increase of the amount of information Thefact that our meta-analytical value of r=minus106 isvirtually identical to the theoretically expected correla-tion between g and d of minus100 holds some promise thata psychometric meta-analysis of studies using MCV is apowerful way of reducing some of the limitations ofMCV An alternative methodological approach is tolimit oneself to the rare datasets enabling the use ofstructural equations modeling However from a meta-analytical point of view these studies yield only a quitemodest amount of information

Additional meta-analyses of studies employing MCVare necessary to establish the validity of the combinationof MCV and psychometric meta-analysis Most likelymany would agree that a high positive meta-analyticalcorrelation between measures of g and measures ofanother construct implies that g plays a major role andthat a meta-analytical correlation of minus100 implies that gplays no role However it is not clear what value of themeta-analytical correlation to expect from MCV when gplays only a modest role After the present meta-analysison a construct that clearly has an inverse relationshipwith g it would be informative to carry out meta-analyses of studies on variables that are strongly linkedto g and variables that are modestly linked to g Anexample of the latter would be secular score gainswhich according to Lynns (1990) nutrition theoryshould be modestly g-loaded

The sample sizes in the South African study are notlarge but still larger than those in many other studies oflearning potential where an Nasymp10 is not unusual Theresults of a reanalysis of the many existing studies ondynamic testing could lead to a meta-analysis with alarge combined N The mean posttest score was quitehigh so a ceiling effect may have taken place for theWhiteIndianColored group leading to an underestima-tion of the experimental score gain for this group

Instead of testing the hypothesis with a stronglyunidimensional test such as the RSPM it would be betterto use a multidimensional test Moreover a large samplesize would allow the use of more rigorous data-analyticaltechniques leading to more definitive results Howeverto the best of our knowledge datasets meeting theserequirements do not exist and the Skuy et al study isarguably the best South African learning potential study

20 Score gains as low-quality measuresof motivation

As criterion-related validity is strongly dependent ong te Nijenhuis et als finding of lowered g loadings

after training should result in lowered criterion-relatedvalidity However the empirical findings show theopposite virtually all testndashretest and test preparationstudies on cognitive tests and scholastic aptitude tests thatreported both criterion-related validities demonstratesmall to modest increases in criterion-related validity forthe second or third test score (see Allalouf amp Ben-Shakhar 1998 Bashi 1976 Coyle 2006 HausknechtTrevor amp Farr 2002 Jones 1986 Linn 1977 Olsen ampSchrader 1959 Ortar 1960 Powers 1985 Reeve ampLam 2005) In the carefully designed study by Allaloufand Ben-Shakhar (1998) of a university entrance test theexperimental group received an intensive 40-h testcoaching program while the control group did not Thecriterion-related validity for the retest increased for bothgroups Most importantly the increase was the samemdashitwas not larger for the experimental group

In a little-known but carefully designed large-scalelearning potential study by Resing (1990 see Table423) she compared an experimental group thatreceived a pretest a learning potential training and aposttest against a control group that received only thepretest and the posttest The mean criterion-relatedvalidity of the various second scores was 62 for both theexperimental and the control group Learning potentialtraining did not result in incremental criterion-relatedvalidity over and above the validity resulting fromsimply retesting The findings from both Resing andAllalouf and Ben-Shakhar suggest that cognitiveinterventions do not increase criterion-related validitymore than simple retesting

g and the personality measure conscientiousness havebeen shown to make an excellent combination ofpredictors (Schmidt amp Hunter 1998) Conscientiousnessrepresents among other characteristics persistence a willto achieve and the ability to focus effort on the goal Afield study on test preparation using actual job applicants(Clause Delbridge Schmitt Chan amp Jennings 2001)showed that motivation to perform well on the testcorrelated 25 with test performance One could speculatethat score increases do not reflect a true cognitivecomponent but rather become low-quality measures ofmotivation Further since the increase in validity due toretesting and learning potential training is modest incomparison to the large increase obtainable from the useof personality questionnaires personality testing mightprovide a less expensive and more accurate alternative

21 Effectiveness of various training formats

Components of the mediation training used by Skuyet al (2002) are similar to the test training used in te

297J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Nijenhuis et al (2001) Both the Dutch training and theSouth African training took 3 h but whereas in theDutch training the focus was on two different testformats the South African training dealt only with onetest format The test training by Lloyd and Pidgeon(1961) took even less time namely two half-hoursegments each focusing on one test format The effectsizes in all studies were roughly comparable Thissuggests that the methodologies employed by teNijenhuis et al and Lloyd and Pidgeon were moreefficient than those used by Skuy et al It is possible thatthe components of the mediation training that are notpresent in the other two training formats are not effectivein raising test scores and could therefore be left out Iftrue it might be possible to increase the scores on theRSPM by one SD with a relatively simple 1-h training

22 Generalizability of findings

Can these findings of hollow score gains after testndashretest test practice and Mediated Learning ExperienceTraining be generalized to other studies where training-induced score gains were found Ericsson and Lehmann(1996) reported tremendous score increases afterintensive training on numeric memory tests but thesegains did not generalize in the least to verbal memorytests Such gains on one narrow ability do not generalizeto another narrow ability clustering under the samebroad ability and are therefore hollow Similarly Jensen(1998b) showed that score gains due to adoption werenot on the g factor and were therefore most likelyhollow

Rushton (1999) argued that intergenerational scoregains are not linked to g suggesting the Flynn effectsmay be empty but he was strongly criticized by Flynn(1999 2000) In studies on the Flynn effect score gainsfound in cross-sectional studies are largest on the RSPM(Flynn 1987) It has been suggested by Lynn (1998) thata substantial part of these intergenerational score gainson the RSPM are generalizablendashthey do reflect highergndashbut the remaining part is hollow and should beinterpreted as schooling effects The RSPM does requirethe application of the mathematical principles ofaddition subtraction progression and the distributionof values In the three decades (1950sndash1980s) overwhich these increases in RSPM scores have occurredincreasing proportions of 15- to 18-year-olds haveremained in schools where they have learned mathskills that they have applied to the solution of matricesproblems Our findings could be interpreted as supportfor Lynns hypothesis of the partial hollowness of scoregains on the RSPM Notwithstanding the high g loading

of the sum score of the RSPM it is quite sensitive totestndashretest effects and training effects Some studies onthe Flynn effect (Lynn amp Hampson 1986 Teasdale ampOwen 1989) show that the increase in scores is largelyconcentrated in the lower segments of the IQ distribu-tion Our finding that low scorers show the largest gainsafter training may additionally support the notion that apart of the Flynn effect on the RSPM is hollow FinallyWicherts et als (2004) findings show that in some oftheir datasets the secular score gains are most stronglylinked to broad- narrow- and test-specific abilitiesshowing that an important part of the gains are non-generalizable

Ceci (1991) showed that increased schooling leads tohigher IQ scores but are these gains highly specific orpredominantly generalizable It would be interesting toapply the techniques we used in this study to thefindings from previous intervention studies It may bethat biological interventions (such as diet vitaminsupplements vaccination against infectious disease)rather than psychological or educational interventionsare the most cost-effective method of producing truechanges in g and broad abilities It may be that there is abiological barrier between the first stratum and thesecond stratum that restricts the effects of behavioralinterventions to narrow abilities and test specificities

Acknowledgement

We like to thank Mervyn Skuy for his permission touse his dataset

Thanks to Marieacute de Beer Raegan Murphy WelkoTomic Art Jensen and Frank Schmidt for feedback onprevious versions of this paper

Thanks to Arne Evers Wilma Resing (Dutch TestCommittee) and Andress Kooij (Harcourt) for alsohelping in locating testndashretest studies

References

Ackerman P L (1986) Individual differences in informationprocessing An investigation of intellectual abilities Intelligence10 101minus139

Ackerman P L (1987) Individual differences in skill learning Anintegration of psychometric and information processing skillsPsychological Bulletin 102 3minus27

Allalouf A amp Ben-Shakhar G (1998) The effect of coaching on thepredictive validity of scholastic aptitude tests Journal ofEducational Measurement 35(1) 31minus47

Ashton M C amp Lee K (2005) Problems with the method ofcorrelated vectors Intelligence 33 431minus444

Bashi Y (1976) Verbal and non-verbal abilities of 4th 6th and 8thgrade students in the Arab educational system in Israel JerusalemHebrew University School of Education

298 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Bleichrodt N Resing W C M Drenth P J D amp Zaal J N (1987)Intelligentie-meting bij kinderen Empirische en methodologischeverantwoording van de geReviseerde Amsterdamse Kinder Intelli-gentie Test [Measuring the intelligence of children Empirical andmethodological justification of the Revised Amsterdam ChildrenIntelligence Test] Lisse the Netherlands Swets

Bennett G K Seashore H G ampWesman A G (1974)DifferentialAptitude Tests (5th ed) Manual New York The PsychologicalCorporation

Boeyens J C A (1989) Learning potential An empiricalinvestigation Pretoria South Africa Human Science ResearchCouncil

Bosch F (1973) Inventarisatie beschrijving en onderzoek mbt dewijzigingen van de GATB incl test-hertest onderzoek (NoPz3bRp0120) [Stock-taking description and research concern-ing the modifications of the GATB includes testndashretest study]Utrecht the Netherlands Nederlandse Spoorwegen

Carroll J B (1993) Human cognitive abilities A survey of factoranalysis studies Cambridge University Press

Ceci S J (1991) How much does schooling influence generalintelligence and its cognitive components A reassessment of theevidence Developmental Psychology 27 703minus722

Christian K Bachnan H J amp Morrison F J (2001) Schooling andcognitive development In R J Sternberg amp E L Grigorenko(Eds) Environmental effects on cognitive abilities (pp 287minus335)Mahwah NJ Erlbaum

Clause C S Delbridge K Schmitt N Chan D amp Jennings D(2001) Test preparation activities and employment test perfor-mance Human Performance 14 149minus167

Cohen J (1988) Statistical power analysis for the behavioralsciences Hillsdale Lawrence Erlbaum

Colom R Jung R E amp Haier R J (in press) Finding the g-factor inbrain structure using the method of correlated vectors Intelligence

Covin T A (1977) Stability of the WISC-R for 9-year-olds withlearning difficulties Psychological Reports 40 1297minus1298

Coyle T R (2006) Testndashretest changes on scholastic aptitude tests arenot related to g Intelligence 34 15minus27

Cronbach L J (1990) Essentials of psychological testing New YorkHarperCollins

de Villiers AB (1999) Disadvantaged students academic perfor-mance Analysing the zone proximal developmentUnpublished DPhil thesis University of Cape Town South Africa

de Wolff C J amp Buiten B (1963) Een factoranalyse van viertestbatterijen [A factor analysis of four test batteries] NederlandsTijdschrift Voor Psychologie 18 220minus239

Dolan C V amp Lubke G (2001) Viewing Spearmans hypothesisfrom the perspective of multigroup PCA A comment onSchonemanns criticism Intelligence 29 231minus245

Drenth P J D Petrie J F amp Bleichrodt N (1968) Handleiding bijde Amsterdamse Kinder Intelligentie Test [Manual of theAmsterdam Children Intelligence Test] Amsterdam VrijeUniversiteit

Elliott C D (1983) British Ability Scales Manual 2 TechnicalHandbook Windsor Great-Britain NFER-Nelson

Engelbrecht M (1999) Leerpotensiaal as voorspeller van akademi-ese sukses van universiteitsstudente [Learning potential aspredictor of the academic success of university students]Unpublished D Phil thesis Potchefstroom University forChristian Higher Education South Africa

Ericsson K A amp Lehmann A C (1996) Expert and exceptionalperformance Evidence of maximal adaptation to task constraintsAnnual Review of Psychology 47 273minus305

Evers A amp Lucassen W (1991) Handleiding DAT 83 DifferentieumlleAanleg Testserie [Manual DAT83 Differential Aptitude Testseries] Amsterdam Swets

Fleishman E A amp Hempel W E (1955) The relation betweenabilities and improvement with practice in a visual discriminationreaction task Journal of Experimental Psychology 49 301minus312

Flynn J R (1987) Massive IQ gains in 14 nations What IQ testsreally measure Psychological Bulletin 101 171minus191

Flynn J R (1999) Evidence against Rushton The genetic loading ofWISC-R subtests and the causes of between-group IQ differencesPersonality and Individual Differences 26 373minus379

Flynn J R (2000) IQ gains WISC subtests and fluid g g theory andthe relevance of Spearmans hypothesis to race In G R B JGoode (Ed) The nature of intelligence (pp 202minus227) New YorkWiley

Gaydon VP (1988) Predictors of performance of disadvantagedadolescents on the SowetoAlexandra gifted child programmeUnpublished M Ed dissertation University of the WitwatersrandSouth Africa

Gottfredson L S (1997) Why g matters The complexity of everydaylife Intelligence 24(1) 79minus132

Gottfredson L S (2002) g Highly general and highly practical In RJ Sternberg amp E L Grigorenko (Eds) The general intelligencefactor How general is it (pp 331minus380) Mahwah NJ Erlbaum

Grigorenko E L amp Sternberg R J (1998) Dynamic testing Psy-chological Bulletin 124 75minus111

HaeckW Yeld N Conradie J Robertson N amp Shall A (1997) Adevelopmental approach to mathematics testing for universityadmissions and course placement Educational Studies in Mathe-matics 33 71minus91

Hartmann P Kruuse NHS amp Nyborg H (in press) Testing thecross-racial generality of Spearmans hypothesis in two samplesIntelligence

Hausknecht J P Trevor C O amp Farr J L (2002) Retaking abilitytests in a selection setting Implications for practice effects trainingperformance and turnover Journal of Applied Psychology 87(2)243minus254

Hunter J E amp Schmidt F L (1990) Methods of meta-analysisLondon Sage

Hunter J E amp Schmidt F L (2004) Methods of meta-analysis (2nded) London Sage

Jensen A R (1980) Bias in mental testing London MethuenJensen A R (1985) The nature of the blackndashwhite difference on

various psychometric tests Spearmans hypothesis Behavioraland Brain Sciences 8 193minus263

Jensen A R (1998a) The g factor The science of mental abilityLondon Praeger

Jensen A R (1998b) Adoption data and two g-related hypothesesIntelligence 25 1minus6

Johnson W Bouchard T J Krueger R F Jr McGue M ampGottesman I I (2004) Just one g Consistent results from threetest batteries Intelligence 32 95minus107

Johnson W te Nijenhuis J amp Bouchard TJ Jr (in press)Replication of the hierarchical visual-perceptual-image rotationmodel in de Wolff and Buitens (1963) battery of 46 tests of mentalability Intelligence

Jones R J (1986) A comparison of the predictive validity of theMCAT for coached and uncoached students Journal of MedicalEducation 61 335minus338

Kaufman A S amp Kaufman N L (1983) K-ABC KaufmanAssessment Battery for Children Interpretive manual CirclePines MN AGS

299J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Kooij A P Rolfhus E Wilkins C Yang Z amp Zhu J (2005)WAIS-III Nederlandstalige bewerking Technisch rapport hernor-mering [WAIS-III adoptation in Dutch Technical report renorm-ing] Amsterdam Harcourt

Kort W Schittekatte M Dekker P H Verhaeghe P Compaan EL Bosmans M amp Vermeir G (2005) WISC-IIINL WechslerIntelligence Scale for Children Derde Editie NL Handleiding enverantwoording [The Dutch WISC-III Wechsler Intelligence Scalefor Children Third Edition for the Netherlands Manual andjustification] Amsterdam NIP

Kulik J A Bangert-Drowns R L amp Kulik C C (1984)Effectiveness of coaching for aptitude tests PsychologicalBulletin 95 179minus188

Kulik J A Kulik C C amp Bangert R L (1984) Effects of practiceon aptitude and achievement test scores American EducationalResearch Journal 21 435minus447

Lee K H Choi Y Y Gray J R Cho S H Chae J -H Lee S etal (2006) Neural correlates of superior intelligence Strongerrecruitment of posterior parietal cortex NeuroImage 29(2)578minus586

Linn R L (1977) On the treatment of multiple scores for LawSchool Admission Test repeaters (Report LSAC-77-4) In LawSchool Admission Council Reports of LSAC Sponsored ResearchVolume III 1975-1977 Princeton NJ Law School AdmissionCouncil

Lipson LE (1992) Relationship of static and dynamic measures toscholastic achievement of black pupils Unpublished MEddissertation University of Witwatersrand South Africa

Lloyd F amp Pidgeon D A (1961) An investigation into the effects ofcoaching on non-verbal test material with European Indian andAfrican children British Journal of Educational Psychology 31145minus151

Luteijn F amp Barelds D P H (2005) GIT2 Groninger IntelligentieTest 2 [GIT2 Groningen Intelligence Test 2] Amsterdam Harcourt

Lynn R (1990) The role of nutrition in secular increases inintelligence Personality and Individual Differences 11 273minus285

Lynn R (1998) In support of the nutrition theory In U Neisser(Ed) The rising curve Long-term gains in IQ and relatedmeasures (pp 207minus215) Washington DC American Psycholo-gical Association

Lynn R Allik J amp Irwing P (2004) Sex differences on three factorsidentified in Ravens Standard Progressive Matrices Intelligence32 411minus424

Lynn R amp Hampson S (1986) The rise of national intelligenceEvidence from Britain Japan and the USA Personality andIndividual Differences 7 23minus32

Matarazzo J D Carmody T P amp Jacobs L D (1980) Testndashretestreliability and stability of the WAIS A literature review withimplications for clinical practice Journal of Clinical Neuropsy-chology 2(2) 89minus105

McCormick BK Dunlap WP Kennedy RS amp Jones MB(1983) The effects of practice on the Armed Forces VocationalAptitude Test Battery US Army Research Institute for theBehavioral and Social Sciences Technical Report 602

Mulder J L Dekker R amp Dekker P H (2004) KaufmanIntelligentietest voor adolesecenten en volwassenen (KAIT)Handleiding [Kaufman Intelligence test for adolescents and adults(KAIT) Manual] Leiden the Netherlands PITS

Murphy R (2002) A review of South African research in the fieldof dynamic assessment Unpublished MA dissertation Universityof Pretoria (available online from httpupetdupaczathesisavailableetd-05042002-161239)

Nel A (1997)Die voorspelling van akademiese sukses binne kontekstvan n alternatiewe universiteitstoelatingsbeleid [The predictionof academic success within the context of an alternative policy ofuniversity admission] Unpublished MA dissertation RandAfrikaans University South Africa

Neubauer A C amp Freudenthaler H H (1994) Reaction time in asentence-picture verification test and intelligence Individualstrategies and effects of extended practice Intelligence 19193minus218

Nunnally J C amp Bernstein I H (1994) Psychometric theory(3rd ed) New York McGraw-Hill

Olsen M amp Schrader W B (1959) The use of preliminary and finalScholastic Aptitude Test scores in predicting college grades(College Entrance Examination Board Research and DevelopmentReports and Statistical Reports 59-19 Princeton NJ Educa-tional Testing Service

Ortar G R (1960) Improving test validity by coaching EducationalResearch 2 137minus142

Powers D E (1985) Effects of test preparation on the validity ofGraduate Admission Test Applied Psychological Measurement 9179minus190

Raven J Raven J C amp Court J H (2000) Standard ProgressiveMatrices Raven manual Section 3 Oxford Psychologists Press

Ree M J amp Carretta T R (1994) The correlation of generalcognitive ability and psychomotor tracking tests InternationalJournal of Selection and Assessment 2 209minus216

Ree M J amp Earles A A (1991) Predicting training success Notmuch more than g Personnel Psychology 44 321minus332

Ree M J Earles J A amp Teachout M S (1994) Predicting jobperformance Not much more than g Journal of AppliedPsychology 79 518minus524

Reeve C L amp Lam H (2005) The psychometric paradox of practiceeffects due to retesting Measurement invariance and stable abilityestimates in the face of observed score changes Intelligence 33535minus549

Resing W C M (1990) Intelligentie en leerpotentieel Eenonderzoek naar het leerpotentieel van jonge leerlingen uit hetbasis-en speciaal onderwijs [Intelligence and learning potential Astudy into the learning potential of young students in basic andspecial education] Amsterdam the Netherlands Swets

Rushton J P (1999) Secular gains in IQ are not related to the g factorand inbreeding depressionmdashunlike blackndashwhite differences A replyto Flynn Personality and Individual Differences 26 381minus389

Rushton J P Skuy M amp Bons T A (2004) Construct validity ofRavens Advanced Progressive Matrices for African and non-African engineering students in South Africa InternationalJournal of Selection and Assessment 12(3) 220minus229

Schmidt F L amp Hunter J E (1998) The validity and utility ofselection methods in personnel psychology Practical and theore-tical implications of 85 years of research findings PsychologicalBulletin 124(2) 262minus274

Schmidt F L amp Hunter J E (1999) Theory testing andmeasurement error Intelligence 27(3) 183minus198

Schmidt F L amp Le H (2004) Software for the Hunter-Schmidtmeta-analysis methods University of Iowa Department ofManagement and Organization IOWA City IQ 42242

Schroots J J F amp van Alphen de Veer R J (1979) LDT LeidseDiagnostische Test Deel 1 Handleiding [LDT Leiden DiagnosticTest Part 1 Manual] Lisse the Netherlands Swets

Shochet I M (1986) Manifest and potential performance inadvantaged and disadvantaged students Unpublished DPhildissertation University of the Witwatersrand South Africa

300 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Skuy M Gewer A Osrin Y Khunou D Fridjon P amp RushtonJ P (2002) Effects of mediated learning experience on RavensMatrices scores of African and non-African university studentsin South Africa Intelligence 30 221minus232

Swanson H E amp Lussier C M (2001) A selective synthesis of theexperimental literature on dynamic assessment Review of Educa-tional Research 71 321minus363

Teasdale T W amp Owen D R (1989) Continuing secular increase inintelligence and a stable prevalence of high intelligence levelsIntelligence 13 255minus262

Tuma J M amp Appelbaum A S (1980) Reliability and practiceeffects of WISC-R IQ estimates in a normal population Educa-tional and Psychological Measurement 40 671minus678

te Nijenhuis J Tolboom E Resing W amp Bleichrodt N (2004)Does cultural background influence the intellectual performance ofchildren from immigrant groups Validity of the RAKITintelligence test for immigrant children European Journal ofPsychological Assessment 20 10minus26

te Nijenhuis J amp van der Flier H (1997) Comparability of GATBscores for immigrants and majority group members Some Dutchfindings Journal of Applied Psychology 82 675minus687

te Nijenhuis J Voskuijl O F amp Schijve N B (2001) Practice andcoaching on IQ tests Quite a lot of g International Journal ofSelection and Assessment 9 302minus308

Thorndike R L (1985) The central role of general ability inprediction Multivariate Behavioral Research 20 241minus254

Tucker L R Damarin F amp Messick S (1966) A base-free measureof change Psychometrika 31(4) 457minus473

van der Doef M P Kwint J M amp van der Koppel (1989) Wat lerenmoeilijk lerende kinderen van de WISC-R [What do children whohave difficulties in learning learn from the WISC-R] Kind enAdolescent 10 136minus141

United States Department of Labor (1970) Manual for the USTESGeneral Aptitude Test Battery Section III DevelopmentWashing-ton DC United States Department of Labor

van Geffen (1972) De betrouwbaarheid van de GATB 1002-B opbrugklasniveau [The reliability of the GATB 1002 B for the firstclass at secondary school] Catholic University Nijmegen theNetherlands Psychology of Work and Organisation

van Haasen P P de Bruyn E E J Pijl Y J Poortinga Y H LutjeSpelberg H C Vander Steene G et al (1986) WISC-RWechsler Intelligence Scale for Children-Revised Nederlandsta-lige uitgave [WISC-R Wechsler Intelligence Scale for Children-Revised Dutch edition] Lisse the Netherlands Swets

Wechsler D (1955) Manual for the Wechsler Adult IntelligenceScale New York The Psychological Corporation

Wechsler D (1967)Manual for the Wechsler Preschool and PrimaryScale of Intelligence New York The Psychological Corporation

Wechsler D (1974) Manual for the Wechsler Intelligence Scale forChildren-Revised New York The Psychological Corporation

Wechsler D (1981) WAIS-R manual Wechsler Adult IntelligenceScale-Revised New York The Psychological Corporation

Wechsler D (1997)WAIS-III Wechsler Adult Intelligence Scale-thirdedition and WMS-III Wechsler Memory Scale-third editionTechnical manual New York The Psychological Corporation

Wicherts J W Dolan C V Oosterveld P van Baal G C VBoomsma D I amp Span M M (2004) Are intelligence testsmeasurement invariant over time Investigating the nature of theFlynn effect Intelligence 32(5) 509minus537

Yeld N amp Haeck W (1997) Educational histories and academicpotential Can tests deliver Assessment and Evaluation in HigherEducation 22 5minus16

Zaaiman H (1998) Selecting students for Mathematics and ScienceThe challenge facing higher education in South Africa SouthAfrica Pretoria HSRC Publishers

Zaaiman H van der Flier H amp Thijs G D (2001) Dynamic testingin selection for an educational programme Assessing SouthAfrican performance on the Raven Progressive Matrices Inter-national Journal of Selection and Assessment 9 258minus269

Zolezzi S A (1992) Alternative selection measures for universityundergraduate admissions Unpublished MEd dissertation Uni-versity of the Witwatersrand South Africa

Zolezzi S A (1995) The effectiveness of dynamic assessment as analternative aptitude testing strategy Unpublished DPhil disserta-tion University of South Africa South Africa

284 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

instance on a memory task very similar to the subtestForward Digit Span of the WISC It is clear that IQscores can be increased by training The question is whatinferences can be drawn from these gains Do theyrepresent true increases in mental ability or simply inperformance on a particular test instrument

2 Jensens hypothesis score gains can besummarized in the hierarchical intelligence model

Jensen (1998a ch 10) hypothesized that the effectsof training on abilities can be summarized in terms ofCarrolls (1993) three-stratum hierarchical factor modelof cognitive abilities At the highest level of thehierarchy (stratum III) is general intelligence or g Onelevel lower (stratum II) are the broad abilities FluidIntelligence Crystallized Intelligence General Memoryand Learning Broad Visual Perception Broad AuditoryPerception Broad Retrieval Ability and Broad Cogni-tive Speediness or General Psychomotor Speed Onelevel lower still (stratum I) are the narrow abilities suchas Sequential Reasoning Quantitative ReasoningVerbal Abilities Memory Span Visualization andPerceptual Speed At the lowest level of the hierarchyare large numbers of specific tests and subtests Sometests despite seemingly very different formats havebeen demonstrated empirically to cluster into onenarrow ability (Carroll 1993)

It is hypothesized that a training effect is most clearlymanifested at the lowest level of the hierarchy ofintelligence namely on specific tests that most resemblethe trained skills One hierarchical level higher thetraining effect is still evident for certain narrow abilitiesdepending on the nature of the training However thegain virtually disappears at the level of broad abilitiesand is altogether undetectable at the highest level gThis implies that the transfer of training effects isstrongly limited to tests or tasks that are all dominatedby one particular narrow skill or ability There isvirtually no transfer across tasks dominated by differentnarrow abilities and it disappears completely beforereaching the level of g Thus there is an increase innarrow abilities or test-specific ability that is indepen-dent of g Test-specific ability is defined as that part of agiven tests true-score variance that is not common toany other test ie it lacks the power to predictperformance on any other tasks except those that arehighly similar Gains on test specificities are thereforenot generalizable but lsquoemptyrsquo or lsquohollowrsquo Only the gcomponent is highly generalizable Jensen (1998a ch10) gives various examples of empty score gainsincluding a detailed analysis of the Milwaukee project

claiming IQ scores rose but not g scores Anotherexample of empty score gains is given by ChristianBachnan and Morrison (2001) who state that increasesdue to schooling show very little transfer acrossdomains

It is hypothesized that the g loadings of the few teststhat are most similar to the trained skills and thereforemost likely to reflect the specific training diminish aftertraining That is after training these particular testsreflect the effect of the specific training rather than thegeneral ability factor

It is one of the most firmly established facts in thesocial sciences that IQ tests have a high degree ofpredictive validity for educational criteria (Jensen 1980Schmidt amp Hunter 1998) meaning that high-g personslearn virtually always more than low-g persons Forinstance Kulik Kulik et als (1984) meta-analysisreported practice effects on intelligence tests of 080 SD 040 SD and 017 SD for subjects of high middleand low ability respectively In industrial psychologythe more complex the training or job the higher thecorrelation of performance with g (Schmidt amp Hunter1998) This means that training or job situations andalso educational settings vary in the degree to whichthey are g-loaded (Gottfredson 1997 2002) HoweverAckerman (1987) cites several classical studies on theacquisition of simple skills through often repeatedexercise where low-g persons made the most progressThese findings could be interpreted as an indication thatthis specific skill acquisition process is not g-loaded Itmay be that some of the various forms of trainingreferred to above also show the largest gains for low-gpersons

There are many ways to test Jensens hypothesisBelow we address (1) studies on repeated testing and gloadedness (2) studies on practice and coaching and (3)studies on learning potential The practice studies used apretestndashposttest design where both the coaching andlearning potential studies used a pretestndashinterventionndashposttest design

3 First test of Jensens hypothesis studies onrepeated testing and g loadedness

What do we find after repeated test taking In aclassic study by Fleishman and Hempel (1955) assubjects were repeatedly given the same psychomotortests the g loading of the tests gradually decreased andeach tasks specificity increased Neubauer and Freu-denthaler (1994) showed that after 9 h of practice the gloading of a modestly complex intelligence test droppedfrom 46 to 39 Te Nijenhuis Voskuijl and Schijve

285J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

(2001) showed that after various forms of test prepara-tion the g loadedness of their test battery decreased from53 to 49 Based on the work of Ackerman (19861987) it can be concluded that through practice oncognitive tasks part of the performance becomesoverlearned and automatic the performance requiresless controlled processing of information which isreflected in lowered g loadings

4 Second test of Jensens hypothesis studies onpractice and coaching

Three studies on practice and coaching have shownincreases in test scores that are not related to the g factorThis suggests that the gains are lsquoemptyrsquo or lsquohollowrsquo Inthe first study Jensen (1998a ch 10) analyzed the effectof practice on the General Aptitude Test Battery(GATB) He found negative correlations ranging fromminus 11 to minus 86 between effect sizes on practice and thetests g loadings Therefore the gains were largest onthe least cognitively complex tests In the second studyte Nijenhuis et al (2001) found a small correlation ofminus 08 for test practice and large negative correlations ofminus 87 for both of their two test coaching conditionsJensen carried out a factor analysis of the various GATBscore gains and found two large factors that did notcorrelate with the g factor extracted from the GATBMost likely the score gains are not on the g factor or thebroad abilities but on the test specificities since teNijenhuis et al showed that practice and coachingreduce the g-loadedness of their tests In a third study(Coyle 2006) factor analysis demonstrated that thechange in aptitude test scores had a zero loading on theg factor

So the studies on practice and coaching appear tosupport the theory However since there are only a fewempirical studies that have tested the link (or absencethereof) between gains in test score from practice andcoaching and g loadings replications are required beforethe conclusion can be firmly established Therefore wecombined several such studies with various DutchBritish and American test batteries into a meta-analysis

5 Third test of Jensens hypothesis studies onlearning potential

Jensen hypothesizes that the effects of training arenot on g but that the gains are empty and trainingshould therefore not lead to increased predictivevalidity Based on learning potential theory one wouldcome to an opposite prediction namely that trainingleads to higher predictive validity The fact that the

theoretical framework of learning potential does notinclude the g factor is of no importance here we solelyfocus on a prediction based on learning potential theorythat is opposite to a prediction based on Jensens theorybased on a hierarchical intelligence model Somelearning potential training studies report predictivevalidities of pre- and posttest scores Based on Jensenstheory one would predict (1) no higher predictivevalidity for learning potential tests in comparison withclassical cognitive tests and (2) no increase in pre-dictive validity due to training when using posttestscores instead of pretest scores However based onlearning potential theory one would predict a sub-stantial increase in predictive validity in both cases Sostudies on learning potential constitute a test ofJensens hypothesis

A large number of studies have been carried out tocheck for learning potential beyond IQ scores generallyshowing that scores go up substantially after mediationApart from theoretical considerations dynamic testsshould show higher criterion-related validities thanclassical IQ tests to justify the time-consuming proce-dure Based on a lengthy review of most of the literatureGrigorenko and Sternberg (1998) concluded that theempirical data do not consistently show higher pre-dictive power of dynamic tests compared with tradi-tional tests Murphy (2002) did an excellent and detailedreview of all South African studies on learning potentialincluding virtually all missed by Grigorenko andSternberg (probably due to difficulty of access) Manystudies (Boeyens 1989 de Villiers 1999 Engelbrecht1999 Gaydon 1988 Haeck Yeld Conradie Robert-son amp Shall 1997 Lipson 1992 Nel 1997 Shochet1986 Skuy et al 2002 Yeld amp Haeck 1997 Zaaiman1998 Zaaiman van der Flier amp Thijs 2001 Zolezzi1992 1995) used data from the numerous South Africanuniversity entrance programs that had adopted adynamic framework for assessing disadvantaged under-prepared students The aim of these programs was togive underprepared applicants an optimal chance toprove that they have the ability to succeed with furtherstudy Again it was found that while some South Africanstudies show higher criterion-related validities forlearning potential tests the effect was not consistent

However in these studies the learning potential testswere compared against individual tests or anunweighted combination of a limited number of testsbut generally not against a full test battery and in no caseagainst g scores g scores have been shown to yieldhigher predictive validities than individual tests or anunweighted score sum (Ree amp Earles 1991 Ree et al1994 Thorndike 1985) So these were comparisons

286 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

where the cognitive predictor with the highest predictivevalidity was not used but where the dynamic tests werepitted against predictors with substantially lower pre-dictive validities than g As no direct comparisons weremade between learning potential tests and g it is notpossible to draw the conclusion that g had higherpredictive validity However since a comparison of alearning potential test with one test or a combination of alimited number of tests generally results in comparablepredictive validities and g scores clearly have higherpredictive validities than one test or a combination of alimited number of tests it not unlikely that a g score willhave a higher predictive validity than a learningpotential test score This also suggests that the findingsmight best be interpreted as tentative support forJensens theory

So the studies on learning potential appear to supportthe theory that score gains can be summarized in thehierarchical intelligence model However more directtests of the theory are required and therefore a learningpotential study was reanalyzed

6 Research questions

The research question of this study is whether scoregains from testndashretest studies and mediated interven-tions can be summarized in terms of Carrolls three-stratum hierarchical intelligence model We examinedwhether (1) correlations between score gains and the gloadedness of the scores are negative in sign (2) the gloadedness of scores decreases after mediation and (3)low-g persons show the largest gains after the mediationtraining We carried out a meta-analysis to be able toprovide a convincing answer to the first researchquestion In a more explorative study on learningpotential in South Africa we tried to find support for allthree research questions

7 Testndashretest studies

To test whether there is a negative correlationbetween g loading of tests and score gains we carriedout a meta-analysis of all testndashretest studies of DutchBritish and American test batteries available in theNetherlands All studies were simple practice studiesndashno intervention such as additional coaching took placendashand used well-validated tests

8 Method

Psychometric meta-analysis (Hunter amp Schmidt1990) aims to estimate what the results of studies

would have been if all studies had been conductedwithout methodological limitations or flaws The resultsof perfectly conducted studies would allow a lessobstructed view of the underlying construct-levelrelationships (Schmidt amp Hunter 1999) One of thegoals of the present meta-analysis is to have a reliableestimate of the true correlation between standardizedtestndashretest score gains (d) and g Although the constructof g has been thoroughly studied the construct under-lying score gains is less well understood One of theaims of the present study is to have a clearerunderstanding of the construct underlying score gainsby linking it to the g nexus Carrying out a completemeta-analysis on the relationship between d and gwould require the collection of a very large number ofdatasets However applying meta-analytical techniquesto a sufficiently large number of studies will also lead toa reliable estimate of the true correlation between d andg We therefore collected a large number of studiesheterogeneous across various possible moderators

To get a reliable correlation between g and d wefocused on batteries with a minimum of seven subtestsLibraries and test libraries of universities were searchedand several members of the Dutch Testing Commissionand test publishers were contacted We limited ourselvesto non-clinical samples without health problems Only aminority of test manuals report testndashretest studiesespecially before 1970 they are rare The search yieldedvirtually all testndashretest studies available in the Nether-lands The GATB manual (1970 ch 20) reports verylarge datasets on secondary school children who tookthe GATB with respectively 1- 2- and 3-year intervalsAt the time of the first test large samples of children thathad the same age as the testndashretest children at the time ofthe second test also took the test Through a comparisonof the scores the maturation effects could be separatedfrom the testndashretest effects so we included the data inthe present study

Standardized score gains were computed by dividingthe raw score gain by the SD of the pretest In generalg loadings were computed by submitting a correlationmatrix to a principal axis factor analysis and using theloadings of the subtests on the first unrotated factor Insome cases g loadings were taken from studies whereother procedures were followed these procedures havebeen shown empirically to lead to highly comparableresults Pearson correlations between the standardizedscore gains and the g loadings were computed

Psychometric meta-analytical techniques (Hunter ampSchmidt 1990 2004) were applied to the resulting 64rgds using the software package developed by Schmidtand Le (2004) Psychometric meta-analysis is based on

287J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

the principle that there are artifacts in every dataset andthat most of these artifacts can be corrected In thepresent study we corrected for five artifacts that alter thevalue of outcome measures listed by Hunter andSchmidt (1990) (1) sampling error (2) reliability ofthe vector of g loadings (3) reliability of the vector ofscore gains (4) restriction of range of g loadings and(5) deviation from perfect construct validity

81 Correction for sampling error

In many cases sampling error explains the majorityof the variation between studies so the first step in apsychometric meta-analysis is to correct the collectionof effect sizes for differences in sample size between thestudies

82 Correction for reliability of the vector of g loadings

The value of rgd is attenuated by the reliability of thevector of g loadings for a given battery When twosamples have a comparable N the average correlationbetween vectors is an estimate of the reliability of eachvector The collection of datasets in the present studyincluded no g vectors for the same battery from differentsamples and therefore artifact distributions were basedupon other studies reporting g vectors for two or moresamples So the effect sizes and the distribution ofreliabilities of the g vector were based upon differentsamples When two g vectors were compared thecorrelation between them was used and when morethan two g vectors were compared the averagecorrelation for the various combinations of two vectorswas used The combined N from the samples on whichthe g vector was based was taken as the weight of onedata point

Several samples were compared that differed little onbackground variables For the comparisons usingchildren we chose samples that were highly comparablewith regard to age and for the comparisons of adults wechose samples that were roughly comparable withregard to age In a study on young children Schrootsand van Alphen de Veer (1979) report correlationmatrices for the Leidse Diagnostische Test for eight agegroups between 4 and 8 years of age The averagecorrelation between the adjacent age groups is 75(combined N=1169) Several studies report data onboth younger and older children The DutchFlemishWISC-R (van Haasen et al 1986) has samples withcomparable N of Dutch and Flemish children so the 11age groups between 6 and 16 could be compared Thisresulted in an average correlation of 78 (combined

N=3018) Jensen (1985) reports g loadings of the 12subtests of the WISC-R obtained in three largeindependent representative samples of Black andWhite children The average correlation between the gvectors obtained for each sample is 86 for the Blackchildren (combined N=1238) and 93 for the Whitechildren (combined N=2868) In a study on olderchildren Evers and Lucassen (1991) report the correla-tion matrices of the Dutch DAT The average correlationbetween the g vectors of three educational groups is 88(combined N=3300) The US GATB manual (1970chapter 20) gives correlation matrices for large groupsof boys and girls in secondary school The averagecorrelation between the g vectors of the same-age boysand girls is 97 (combined N=26708) Several studiesreport data on adults g loadings of the eight subtests ofthe GATB are reported by te Nijenhuis and van der Flier(1997) for applicants at Dutch Railways and by de Wolffand Buiten (1963) for seamen at the Royal Dutch Navyresulting in a correlation of 90 (combined N=1306)The US GATB manual (1970) gives correlation matricesfor two large groups of adults which yields a correlationbetween g vectors of 94 (combined N=4519) JohnsonBouchard Krueger McGue and Gottesman (2004)report g loadings for a sample that took the WAIS andWechsler (1955) reports the correlation matrices of theWAIS for adults of comparable age so g loadings couldbe computed The correlation between the g vectors forthe two studies is 72 (combined N=736) So it appearsthat g vectors are quite reliable especially when thesamples are very large

The number of tests in the batteries in the presentstudy varied from 7 to 14 The number of tests does notnecessarily influence the size of rgd but clearly has aneffect upon its variability Because variability in thevalues of the artifacts influences the amount of varianceartifacts explain in observed effect sizes we estimatedthis variability using data from the samples described inthe previous paragraph

83 Correction for reliability of the vector of scoregains

The value of rgd is attenuated by the reliability ofthe vector of score gains for a given battery Whentwo samples have a comparable N the average cor-relation between vectors is an estimate of thereliability of each vector The reliability of the vectorof score gains was estimated using the presentdatasets comparing samples that took the same testand that differed little on background variables Forthe comparisons using children we choose samples

288 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

that were highly comparable with regard to age andfor the comparisons of adults we choose samples thatwere roughly comparable with regard to age

In the GATB manual (1970 ch 15) 13 combinationsof two studies are described where large samples of menand women that are comparable with respect to age andbackground took the same GATB subtests The averageunweighted correlation between the d vectors of menand women is 83 (total N=3760) In the GATB manual(1970 ch 20) three combinations of three studies aredescribed where very large samples of boys and girlsthat are in the same grade in secondary school took thesame GATB subtests This yielded correlations betweenthe d vectors of respectively 99 98 and 94 (totalN=20541) Together van Geffen (1972) and Bosch(1973) report three Dutch GATB testndashretest studies onchildren in secondary school resulting in threecomparisons between d vectors The average N-weighted correlation between the d vectors is 47 (totalN=127) Vectors of score gains from two differentdatasets on the WISC-R were compared Tuma andAppelbaum (1980) tested children with an average ageof 10 and Wechsler (1974) tested 10- and 11-year-oldsThe correlation between the two d vectors is 71 (totalN=147) Comparison of vectors of score gains fromdatasets on the DAT (Bennett Seashore amp Wesman1974) resulted in correlations of respectively 78 and73 so an average r of 76 (total N=254) So it appearsthat d vectors are quite reliable especially when thesamples are very large We estimated the reliabilities ofthe d vectors in the database using data from thesamples described in this paragraph

84 Correction for restriction of range of g loadings

The value of rgd is attenuated by the restriction ofrange of g loadings in many of the standard testbatteries The most highly g-loaded batteries tend tohave the smallest range of variation in the subtests gloadings Jensen (1998a pp 381ndash382) shows thatrestriction in g loadedness strongly attenuates thecorrelation between g loadings and standardized groupdifferences Hunter and Schmidt (1990 pp 47ndash49)state that the solution to range variation is to define areference population and express all correlations interms of that reference population The Hunter andSchmidt meta-analytical program computes what thecorrelation in a given population would be if thestandard deviation were the same as in the referencepopulation The standard deviations can be comparedby dividing the study population standard deviation bythe reference group population standard deviation that

is u=SDstudySDref As the reference we took thetests that are broadly regarded as exemplary for themeasurement of the intelligence domain namely thevarious versions of the Wechsler tests for childrenThe average standard deviation of g loadings of thevarious Dutch and US versions of the WISC-R andthe WISC-III was 0128 So the SD of g loadings ofall test batteries was compared to the average SD ing loadings in the Wechsler tests for children Thisresulted in some batteriesndashsuch as the GATBndashhavinga value of u larger than 100

85 Correction for deviation from perfect constructvalidity

The deviation from perfect construct validity in gattenuates the value of rgd In making up any collectionof cognitive tests we do not have a perfectly repre-sentative sample of the entire universe of all possiblecognitive tests So any one limited sample of tests willnot yield exactly the same g as any other limitedsample The sample values of g are affected by psy-chometric sampling error but the fact that g is verysubstantially correlated across different test batteriesimplies that the differing obtained values of g can allbe interpreted as estimates of a ldquotruerdquo g The value ofrgd is attenuated by psychometric sampling error ineach of the batteries from which a g factor has beenextracted

The more tests and the higher their g loadings thehigher the g saturation of the composite score TheWechsler tests have a large number of subtests withquite high g loadings resulting in a highly g-saturatedcomposite score Jensen (1998a pp 90ndash91) states thatthe g score of the Wechsler tests correlate more than 95with the tests IQ score However shorter batteries witha substantial number of tests with lower g loadings willlead to a composite with a somewhat lower g saturationJensen (1998a ch 10) states that the average g loadingof an IQ score as measured by various standard IQ testsis in the +80 s When we take this value as an indicationof the degree to which an IQ score is a reflection ofldquotruerdquo g we can estimate that a tests g score correlatesabout 85 with ldquotruerdquo g As g loadings are thecorrelations of tests with the g score it is most likelythat most empirical g loadings will underestimate ldquotruerdquog loadings so empirical g loadings correlate about 85with ldquotruerdquo g loadings As the Schmidt and Le computerprogram only includes corrections for the first fourartifacts the correction for deviation from perfectconstruct validity was carried out on the value of rgdafter correction for the first four artifacts To limit the

Table 1Dutch British and US studies of correlations between g loadings and gain scores

Reference Test r N Information

Drenth et al (1968) AKIT minus 57 100 Primary-school childrenvan Geffen (1972) GATB minus 45 42 Secondary-school children

minus 21 42Bosch (1973) GATB minus 07 43 Secondary-school childrenSchroots and van Alphen

de Veer (1979)LDT minus 42 96 Pre-school and secondary-school children

Bleichrodt et al (1987) RAKIT 09 49 Pre-school childrenminus 25 51 Primary-school childrenminus 21 49 Primary-school children

van der Doef et al (1989) WISC-R minus 69 22 Primary-school children with learning problemsMulder et al (2004) KAIT minus 23 46 Secondary-school children+young adults

minus 42 25 AdultsKort et al (2005) WISC-III minus 15 42 Primary-school children

minus 26 67 Primary-school childrenminus 46 39 Secondary-school children

Luteijn and Barelds (2005) GIT2 minus 51 44 AdultsKooij et al (2005) WAIS-III minus 63 60 AdultsElliott (1983) BAS minus 65 60 Primary-school childrenWechsler (1967) WPPSI minus 46 50 Pre-school childrenUnited States Department

of Labor (1970)GATB minus 35 156 Office applicants

minus 66 605 Male high school seniorsminus 70 554 Female high school seniorsminus 58 223 Males 1-day intervalminus 41 186 Females 1-day intervalminus 50 202 Males 2-week intervalminus 52 152 Females 2-week intervalminus 67 156 Males 6-week intervalminus 61 168 Females 6-week intervalminus 43 176 Males 13-week interval02 149 Females 13-week interval

minus 62 157 Males 26-week intervalminus 32 136 Females 26-week intervalminus 69 119 Males 1-year intervalminus 31 183 Females 1-year intervalminus 96 118 Males 2-year intervalminus 75 170 Females 2-year intervalminus 75 123 Males 3-year intervalminus 48 183 Females 3-year intervalminus 92 3398 Boys secondary schoolminus 92 3680 Girls secondary schoolminus 91 3348 Boys secondary schoolminus 91 3491 Girls secondary schoolminus 84 3229 Boys secondary schoolminus 87 3395 Girls secondary school

Wechsler (1974) WISC-R minus 48 97 Primary-school childrenminus 66 102 Primary-school childrenminus 21 104 Secondary-school children

Bennett et al (1974) DAT minus 79 92 Boys secondary schoolminus 53 81 Girls secondary schoolminus 29 81 Boys secondary schoolminus 62 100 Girls secondary school

Covin (1977) WISC-R minus 57 30 Primary-school children with learning problemsTuma and Appelbaum (1980) WISC-R minus 08 45 Primary- and secondary-school childrenMatarazzo et al (1980) WAIS minus 10 29 Young malesWechsler (1981) WAIS-R minus 64 71 Adults

minus 48 48 Adults

(continued on next page)

289J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Table 1 (continued)

Reference Test r N Information

McCormick et al (1983) ASVAB minus 73 57 adultsKaufman and Kaufman (1983) K-ABC minus 27 46 Pre-school children

minus 18 36 Pre- and primary-school childrenminus 22 70 Primary-school children

Wechsler (1997) WAIS-III minus 45 100 Young adultsminus 57 102 Adultsminus 51 104 Adults03 88 Adults

Reeve and Lam (2005) EAS minus 34 123 Undergraduate students

In general the g loadings were based on the correlation matrix taken from the manuals containing the testndashretest studies or from the correlation matrixbased on the largest sample size we could find What follows is a list of the sources of the g loading when not taken from the manuals containing thetestndashretest studyvan Geffen (1972) and Bosch (1973) de Wolff and Buiten (1963) see also Johnson te Nijenhuis and Bouchard (in press) Bleichrodt et al (1987) teNijenhuis et al (2004) who used the same data on which the RAKIT manual is based van der Doef Kwint and van der Koppel (1989) DutchWISC-R manual Elliott (1983) Table 98 Age 90ndash911 years US Dept of Labors GATB (1970) Jensen (1985 p 214) using the largestcorrelation matrix in the GATBmanual Wechsler (1974) Covin (1977) and Tuma and Appelbaum (1980) Jensen (1985 p 214 first study) Bennettet al (1974) average of four highly similar correlation matrices Matarazzo et al (1980) Wechslers (1955 p 17) Table 8 for ages 25ndash34McCormick et al (1983) Ree and Carretta (1994) Reeve and Lam (2005) utilize SEM analyses and use item parcels instead of full scale scores tocompute g loadings The average g loading of all the item parcels for a specific subtest was taken as the g loading of that specific subtest

290 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

risk of overcorrection we conservatively chose thevalue of 90 for the correction

9 Results

The results of the studies on the correlation between gloadings and gain scores are shown in Table 1 The tablegives data derived from 64 studies with participantsnumbering a total of 26990 The table gives thereference for the study the cognitive ability test usedthe correlation between g loadings and gain scores thesample size and background information on the study Itis clear that virtually all correlations are negative and thatthe size of the few positive correlations is very small

Table 2 shows the results of the psychometric meta-analysis of the 64 data points It shows (from left toright) the number of correlation coefficients (K) totalsample size (N) the mean observed correlations (r) andtheir standard deviation (SDr) the true correlations onecan expect once artifactual error from unreliability in theg vector and the d vector and range restriction in the gvector has been removed (ρ) and their standarddeviation (SDρ) The next two columns present thepercentage of variance explained by artifactual errors (

Table 2Meta-analysis results for correlations between g loadings and gain scores af

Studies included K N r SD

All 64 26990 minus 80 20All minus 3 outliers 61 26704 minus 81 18

K=number of correlations N=total sample size r=mean observed correlcorrelation ρ=true correlation (observed correlation corrected for unreliabiliVE=percentage of variance accounted for by artifactual errors 95 CI=9

VE) and the 95 credibility interval (95 CI) Thisinterval denotes the values one can expect for ρ in 19out of 20 cases

The large number of data points and the very largesample size indicate that we can have confidence in theoutcomes of this meta-analysis The estimated truecorrelation has a value of minus 95 and 81 of the variancein the observed correlations is explained by artifactualerrors However Hunter and Schmidt (1990) state thatextreme outliers should be left out of the analysesbecause they are most likely the result of errors in thedata They also argue that strong outliers artificiallyinflate the SD of effect sizes and thereby reduce theamount of variance that artifacts can explain We choseto leave out three outliersndashmore than 4 SD below theaverage r and more than 8 SD below ρndashcomprising1 of the research participants This resulted in nochanges in the value of the true correlation a largedecrease in the SD of ρ with 74 and a large increasein the amount of variance explained in the observedcorrelations by artifacts by 22 So when the threeoutliers are excluded artifacts explain virtually all of thevariance in the observed correlations Finally a correc-tion for deviation from perfect construct validity in g

ter corrections for reliability and restriction of range

r ρ SDρ VE 95 CI

minus 95 11 81 minus074 to 116minus 95 03 99 minus091 to 100

ation (sample size weighted) SDr=standard deviation of observedty and range restriction) SDρ=standard deviation of true correlation5 credibility interval

291J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

took place using a conservative value of 90 Thisresulted in a value of minus106 for the final estimated truecorrelation between g loadings and score gainsApplying several corrections in a meta-analysis maylead to correlations that are larger than 100 or minus100 asis the case here Percentages of variance accounted forby artifacts larger than 100 are also not uncommon inpsychometric meta-analysis They also do occur in othermethods of statistical estimation (see Hunter amp Schmidt1990 pp 411ndash414 for a discussion)

10 Discussion

A large-scale meta-analysis of 64 testndashretest studiesshows that after corrections for several artifacts there isan estimated true correlation of minus106 between gloading of tests and score gains and virtually all of thevariance in observed correlations is attributable to theseartifacts As several artifacts explain virtually all thevariance in the effect sizes other dimensions on whichthe studies differ such as age of the test takers testndashretest interval test used average-IQ samples or sampleswith learning problems play no role at all

The estimated true correlation of minus106 is the resultof various corrections for artifacts that attenuate thecorrelations The estimated values of the artifacts mayunderestimate or overestimate the population values ofthe artifacts Therefore estimates of true effect sizesmay overestimate or underestimate the populationvalues of the effect size As a solution to this problemHunter and Schmidt (2004) suggest carrying out severalmeta-analyses on the same construct and taking theaverage estimated effect size of all meta-analyses Thegeneral idea is that meta-analysis is a powerful researchtool but does not give perfect outcomes

A correlation of minus106 falls outside the range ofacceptable values of a correlation but one has to make adistinction between the meta-analytical estimate of thetrue correlation between g and d and the true correlationbetween g and d We interpret the value of minus106 for themeta-analytical estimate as meaning that the truecorrelation between g and d is minus100 A correlation ofminus100 means that there is an inverse relationshipbetween g and score gains So the tests with the highestg loadings show the smallest gains The most straightfor-ward interpretation of this very large negative correlationis that there is no g saturation in testndashretest gain scores

11 The South African learning potential study

In a carefully carried-out study Skuy et al (2002)used a dynamic testing procedure to see whether it

would improve the scores of Black South Africanstudents on Ravens Standard Progressive Matrices(RSPM) The Bantu Education Act of 1954 establisheda discriminatory educational system characterized bypoorly qualified teachers sparsely equipped and fundedschools and generally poor quality Most Black studentsin the sample had not received the same quality ofeducation as White students Black White Indian andColored research participants took the RSPM on twooccasions and in between randomly constitutedexperimental groups were exposed to the MediatedLearning Experience Both the Black South Africangroup and the group consisting of White Indian andColored South Africans improved over their baseline onthe RSPM and the Black group showed greaterimprovement

The value of these cognitive interventions increaseswhen the score gains are transferred to other tests andto external criteria such as school or work achieve-ment Therefore the research participants also tookFeuersteins Representational Stencil Design Test as atransfer measure The subject is presented with astencil of a geometric design and then asked to pointto which stencils need to be used and in whatsequence in order to construct an identical designLike the RSPM the Stencils test also requiresrepresentationalabstract thinking but the training onthe RSPM showed little transfer to it Moreover thecorrelation of the RSPM scores with performance inthe end-of-year psychology examination did notsignificantly improve after mediation Once againthe score gains were empty they did not generalizeSkuy et al go on to ask the question what it is thatwas improved by their interventions Professor Skuymade his data accessible to the present authors so wecould perform additional analyses

12 Sample

The data from Skuy et al (2002) were used with theexception of data from three research participantsbecause their pretest IQ scores were extremely low(more than 3 SDs below the group mean) Ninety-fiveuniversity students in psychology aged 16 to 29 (meanage=20 SD=23 25 males 70 females) participatedin this study They were 66 Black students (20 males 46females) and 29 White (20) Indian (6) and Colored (3)students (5 males 24 females) The mean age of theBlack group was 20 (SD=25) and of the WhiteIndian and Colored group 19 years (SD=1) Subjectswere randomly assigned to the experimental group(n=55) and to the control group (n=40)

292 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

13 Procedure

The students participated in pre- and posttest phaseswith a group intervention in between The study focusedon improvement in scores on the RSPM using the SetVariations II of the Learning Propensity AssessmentDevice as the mediation task Mediation training took3 h and was conducted by three experienced psychol-ogists with the assistance of six postgraduate psychol-ogy students A detailed description is given in Skuy etal (2002)

14 Measures and cognitive intervention

The Ravens Standard Progressive Matrices consistsof 60 items (divided into 5 sets of 12 items) designedto measure the ability to form comparisons to reasonby analogy and to organize spatial information intorelated wholes It has been established as one of thepurest measures of g (Jensen 1998a) Skuy et al(2002) found no evidence for test bias against Blacksin South African education Rushton Skuy and Bons(2004) showed that the Ravens gave comparablepredictive validities for students from various groupsCross-cultural testing research has clearly shown thatunsufficient proficiency in the language of the testcan lead to biased assessments in tests with a strongverbal component However the Ravens is a non-verbal test

The Learning Propensity Assessment Device con-sists of 14 exercises Each exercise contains an initialmediation task Subsequent tasks increase in complex-ity and novelty and aim to assist the learner toachieve mastery over the task The purpose ofmediation is to assist the learner to develop theappropriate cognitive strategies and functions neededfor the successful completion of the task The SetVariations II of the Learning Propensity AssessmentDevice consists of five sets of items which comprisevariations of Sets C D and E of the RSPM test Eachset of variations contains a learning task for thepurpose of initial mediation followed by a series ofprogressively more difficult variations to which theskills learned must be applied Mediation involvesdiscussing with groups how to define the problem tobe solved focus on the task set rules regulateproblem solving behavior and identify the correctsequence of logical steps needed to solve the taskMediation also involves helping the subject todevelop appropriate concepts verbal tools andinsights in relation to the task A detailed descriptionis given in Skuy et al (2002)

15 Statistical analyses

Although the Skuy et al study is among the SouthAfrican learning potential studies with the largestsample size the N is not large We therefore chosebasic statistical analyses

151 Descriptive statistics

Means standard deviations and reliabilities werecomputed for the various groups With regard tomeasures of effect size Hunter and Schmidt (1990 p271) advise choosing estimates of variance with the leasterror Because repeated test takings tend to change thesize of the SD (Ackerman 1987) we chose the SD ofthe pretests for the denominator The correlationbetween scores before and after the training wascomputed to see whether the training had an effect onthe rank order of individuals scores

152 Correlation between score gains and g loadedness

Because our sample was not large and quite specificestimates of g loadedness were taken from Lynn Allikand Irwings (2004) item analysis of RSPM in Estoniausing a large (N=2735) nationally representativesample The same reasoning as in psychometric meta-analysis applies namely that larger samples give betterestimates of g loadings than smaller samples In ahierarchical factor analysis of the items using structuralequations modeling Lynn et al computed g loadings of52 of the 60 items In the present study Pearsoncorrelations were calculated between the g loadings ofthese 52 items and the effect sizes on these items

153 g loadings

The RSPM consists of dichotomous items so wecomputed a correlation matrix of polychoric correlations(Nunnally amp Bernstein 1994) A principal axis factoranalysis was carried out The percentage varianceexplained by the first unrotated factor was taken as anestimate of g loadedness Because sample size waslimited we collapsed the experimental and the controlgroup

154 Correlation between sum scores and score gains

We tested whether individuals with low-g improvedtheir scores more than those with high-g by correlatinggain scores with pretest RSPM scores for each of thefour research groups As gain scores tend to be

Table 3Proportion of sample selecting the correct answer on items of Ravens Standard Progressive Matrices by group

Set A Set B Set C Set D Set E

Item Black Other a Item Black Other Item Black Other Item Black Other Item Black Other

1 100 100 13 100 100 25 100 97 37 100 100 49 74 902 97 100 14 100 100 26 96 100 38 99 100 50 64 903 97 100 15 100 100 27 96 100 39 89 100 51 79 974 100 97 16 91 97 28 86 93 40 92 100 52 56 835 100 100 17 96 97 29 94 97 41 96 100 53 52 836 99 100 18 85 100 30 76 83 42 92 100 54 35 767 94 97 19 77 66 31 88 97 43 77 100 55 42 798 91 93 20 79 97 32 50 79 44 76 93 56 21 699 100 97 21 83 97 33 74 90 45 71 97 57 30 4110 91 97 22 92 100 34 61 79 46 79 93 58 12 4111 83 90 23 80 90 35 53 69 47 29 41 59 02 1712 68 83 24 59 83 36 06 35 48 26 38 60 11 21a Other=White Indian and Colored

293J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

negatively correlated with pretest scores as a function ofunreliability (see Cronbach 1990 Nunnally amp Bern-stein 1994) we corrected the correlations using TuckerDamarin and Messicks (1966) formula 63 Using theformula one adds to each correlation the term (SDpretestSD gain score) (1minus reliability pretest)

16 Results

161 Descriptive statistics

Internal consistencies (Cronbach αs) on the RSPMranged from 76 to 86 for the pre- and posttestsrespectively Table 3 shows the proportion of each of thegroups which selected the correct answer on each of the60 items of the pretest Across the 60 items the order ofthe p values was almost identical for Blacks and WhiteIndianColoreds (r=92 p=00)

Table 4 shows the means and standard deviations forthe total RSPM scores for the four groups along withthe d effect sizes representing the difference betweenpre- and posttest scores (Cohen 1988) First we

Table 4Pre- and posttest mean ravens scores standard deviations and mean effect

Black experimental(n=40)

Black control (n=26)

Pretest Posttest Pretest Postte

Raw scoresM 4378 5010 4546 4835SD 664 531 669 671Percentile 14 41 16 31Effect size 095 043

Percentiles are based on US adult norms see Raven Raven and Courts (2a Other=White Indian and Colored

examined whether there was an effect of race (Blackvs WhiteIndianColored) and group (experimental vscontrol) on the pretest scores There was a significanteffect due to race (F(1 91)=2413 p=00 η2 = 21)but not group (F(1 91)=228 p= 14 η2 = 02) Thismeans that mean pretest scores of Blacks (M=4444 SD=665) were lower than those of WhiteIndianColoreds (M=5141 SD=505) and that mean pretestscores of experimental and control groups werecomparable (M=4553 SD=704 and M=48 SD=67 respectively)

Secondly we investigated the effects of training onthe posttest scores by performing a two-way ANCOVAon the total posttest scores with race and group as factorsand the total pretest scores as the covariate There was asignificant effect for group (F(1 95)=1381 p=00η2 = 13) and for race (F(1 90)=399 p=05 η2 = 04)but not for the two-way interaction of group and race (F(1 90)=028 p= 60 η2 = 00) These results indicatethat the training was equally effective for both the Blackand WhiteIndianColored students Posttest scores ofBlacks (M=4941 SD=591) however remained

sizes for Black and WhiteIndianColored students

Other a experimental(n=15)

Other control (n=14)

st Pretest Posttest Pretest Posttest

5020 5580 5271 5536605 376 345 34341 75 55 68093 077

000) Table SPM13

294 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

significantly lower (F(1 91)=2833 p=00) than thoseof WhitesIndiansColoreds (M=5559 SD=355)Although posttest scores of the experimental group(M=5165 SD=553) were higher than those of thecontrol group (M=508 SD = 665) differencesbetween both groups were nonsignificant (F(1 91)=085 p=36)

The correlation between scores before and after thetraining was 84 (p=00) for the experimental group and90 (p=00) for the control group showing that thetraining had only limited effect on the rank order ofindividuals scores This means that the test strongly butnot perfectly measures the same constructs on bothoccasions

162 Correlation between score gains and g loadedness

We estimated effect sizes for each of the four groups(race by condition) by computing the differencebetween mean pretest scores and posttest scores dividedby the standard deviation of the pretest scores of Blackand WhiteIndianColored students respectivelyFinally we calculated the correlations between effectsizes and the g loadings taken from Lynn et alCorrelations were minus 24 (p=10) for the Black experi-mental group minus 21 (p=20) for the WhiteIndianColored experimental group minus 08 (p=59) for theBlack control group and minus 41 (p=01) for the WhiteIndianColored control group Small sample sizesusually attenuate correlations (Hunter amp Schmidt1990) Collapsing the groups indeed resulted in higheraverage correlations minus 39 for the complete experimen-tal group and minus 26 for the complete control group

163 g loadings

Using the combined experimental and controlgroup a principle axis factor analysis on the pretestand posttest scores respectively resulted in a firstunrotated factor explaining 22 of the variance in thepretest scores and 18 of the variance in the posttestscores These findings suggest that the g loadedness ofthe RSPM decreased substantially after MediatedLearning Experience

164 Correlation between score gains and sum score

Correlating score gains with RSPM total scoresresulted in values of minus 60 (p=00) for the Blackexperimental group minus 18 (p=38) for the Black controlgroup minus 82 (p= 00) for the WhiteIndianColoredexperimental group and minus 48 (p=08) for the White

IndianColored control group After the use of thecorrection formula of Tucker et al (1966) thesecorrelations became minus 39 minus 08 minus 61 and minus 35respectively Overall these correlations show that low-g persons improved their scores more strongly thanhigh-g persons

17 Discussion

Skuy et al (2002) hypothesized that the low-qualityeducation of Blacks in South Africa would lead to anunderestimate of their cognitive abilities by IQ testsGroups of Black and WhiteIndianColored studentstook the Ravens Progressive Matrices twice and inbetween received Feuersteins Mediated LearningExperience The test scores went up substantially in allgroups Evidence for an authentic change in the g factorrequires broad transfer or generalizability across a widevariety of cognitive performance However Skuy et alshow that the gains did not generalize to scores on another highly similar test and to external criteria andwere therefore hollow As the score gains were in somecases quite largendash14 IQ points for the Black experi-mental groupndashthe question becomes what is it thatimproved

The findings show that the correlations betweenscore gains and g loadedness of the items were minus 39 forthe complete experimental group and minus 26 for thecomplete control group However because the gloadings and gain scores are measured at the itemlevel their reliabilities are not high resulting insubstantial attenuation of the correlation between gand d Moreover RSPM does not measure g perfectlyJensen (1998a p 91) estimates its g loading at 83When we estimate the reliability of the g vector at 70and the reliability of the gain score vector at 50corrections for unreliability and deviation from perfectconstruct validity of g only would result in estimatedtrue correlations of respectively minus 80 and minus 53 Thesevalues should be taken as underestimates controllingfor additional artifacts will bring them closer to the verystrong negative correlation found in the meta-analysis

The findings suggest that after training the gloadedness of the test decreased substantially Wefound negative substantial correlations between gainscores and RSPM total scores Table 4 shows that thetotal score variance decreased after training which is inline with low-g subjects increasing more than high-gsubjects Since as a rule high-g individuals profit themost from trainingndashas is reflected in the ubiquitouspositive correlation between IQ scores and trainingperformance (Jensen 1980 Schmidt amp Hunter 1998)ndash

295J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

these findings could be interpreted as an indication thatFeuersteins Mediated Learning Experience is not g-loaded in contrast with regular trainings that are clearlyg-loaded Substantial negative correlations betweengain scores and RSPM total scores are no definite proofof this hypothesis but are in line with it Additionalsubstantiation of our hypothesis that the Feuersteintraining has no or little g loadedness is that Coyle (2006)showed that gain scores loaded virtually zero on the gfactor Moreover Skuy et al reported that the predictivevalidity of their measure did not increase when thesecond Raven score was used The fact that individualswith low-g gained more than those with high-g could beinterpreted as an indication that the Mediated LearningExperience was not g-loaded It should be notedhowever that Feuerstein most likely did not intend hisintervention to be g-loaded He was interested inincreasing the performance of low scorers on bothtests and external criteria

18 General discussion

IQ scores are by far the best general predictor ofsuccess in education job training and work Howeverthere are many ways in which these IQ scores can beincreased for instance by means of retesting orparticipating in a learning potential training programWhat conclusions can be drawn from such score gainsJensens (1998a) hypothesis that the effects of trainingon abilities can be summarized in terms of Carrollsthree-stratum hierarchical factor model was tested in ameta-analysis on testndashretest data using Dutch Britishand American test batteries and with learning potentialdata from South Africa using Ravens ProgressiveMatrices The meta-analysis convincingly shows thattestndashretest score gains are not g-loaded The findingsfrom the learning potential study are clearly in line withthis when the attenuation caused by unreliability andother artifacts is taken into account the correlationbetween g loadings of items and gains on items has avalue that is somewhat comparable to the one found inthe meta-analysis for test batteries The data suggest thatthe g loadedness of item scores decreases after theintervention training Te Nijenhuis et als (2001)finding that practice and coaching reduced the g-loadedness of their test scores strengthens the presentfindings using item scores The findings show that notthe high-g participants increase their scores the mostndashasis common in training situationsndashbut it is the low-gpersons showing the largest increases of their scoresThis suggests that the intervention training is not g-loaded

Our findings fit quite well with the hierarchical modelof intelligence The generalizability of test scores residespredominantly in the g component whereas the test-specific ability component and the narrow abilitycomponent are virtually non-generalizable This is forinstance evidenced by the earlier finding that addingverbal tests to a g score or numerical tests to a g scoreresulted in only a very small incremental validity (Ree ampEarles 1991 Ree et al 1994) Additionally Ericssonand Lehmann (1996) reported immense gains for amemory task focusing on one narrow ability but did notfind any improvement for comparable memory tasksfocusing on another narrow ability As the score gains arenot related to g the generalizable g componentdecreases and since it is not unlikely that the Feuersteintraining itself is not g-loaded it is easy to understand whythe score gains did not generalize to scores on thecognitively loaded Representational Stencil Design TestFor a similar reason the score gains did not generalize tog-loaded external criteria as the correlation of the RSPMscores with performance in the end-of-year psychologyexamination did not significantly improve after media-tion Reeve and Lam (2005) claimed that retesting doesnot change the nature of what is being tested but ourfindings suggest the opposite

19 Limitations of the studies

Our meta-analysis and our analysis of the SouthAfrican study are strongly based on the method ofcorrelated vectors (MCV) and recently it has been shownto have limitations Dolan and Lubke (2001) have shownthat when comparing groups substantial positive vectorcorrelations can still be obtained even when groups differnot only on g but also on factors uncorrelated with gAshton and Lee (2005) show that associations of avariable with non-g sources of variance can produce avector correlation of zero even when the variable isstrongly associated with g They suggest that the gloadings of a subtest are sensitive to the nature of the othersubtest in a battery so that a specific sample of subtestsmay cause a spurious correlation between the vectorsNotwithstanding these limitations studies using MCVcontinue to appear (see for instance Colom Haier ampJung in pressHartmannKruuseampNyborg in press Leeet al 2006) The outcomes of our meta-analysis of a largenumber of studies using the method of correlated vectorsmay make an interesting contribution to the discussion onthe limitations of the method of correlated vectors

A principle of meta-analysis is that the amount ofinformation contained in one individual study is quitemodest Therefore one should carry out an analysis of

296 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

all studies on one topic and correct for artifacts leadingto a strong increase of the amount of information Thefact that our meta-analytical value of r=minus106 isvirtually identical to the theoretically expected correla-tion between g and d of minus100 holds some promise thata psychometric meta-analysis of studies using MCV is apowerful way of reducing some of the limitations ofMCV An alternative methodological approach is tolimit oneself to the rare datasets enabling the use ofstructural equations modeling However from a meta-analytical point of view these studies yield only a quitemodest amount of information

Additional meta-analyses of studies employing MCVare necessary to establish the validity of the combinationof MCV and psychometric meta-analysis Most likelymany would agree that a high positive meta-analyticalcorrelation between measures of g and measures ofanother construct implies that g plays a major role andthat a meta-analytical correlation of minus100 implies that gplays no role However it is not clear what value of themeta-analytical correlation to expect from MCV when gplays only a modest role After the present meta-analysison a construct that clearly has an inverse relationshipwith g it would be informative to carry out meta-analyses of studies on variables that are strongly linkedto g and variables that are modestly linked to g Anexample of the latter would be secular score gainswhich according to Lynns (1990) nutrition theoryshould be modestly g-loaded

The sample sizes in the South African study are notlarge but still larger than those in many other studies oflearning potential where an Nasymp10 is not unusual Theresults of a reanalysis of the many existing studies ondynamic testing could lead to a meta-analysis with alarge combined N The mean posttest score was quitehigh so a ceiling effect may have taken place for theWhiteIndianColored group leading to an underestima-tion of the experimental score gain for this group

Instead of testing the hypothesis with a stronglyunidimensional test such as the RSPM it would be betterto use a multidimensional test Moreover a large samplesize would allow the use of more rigorous data-analyticaltechniques leading to more definitive results Howeverto the best of our knowledge datasets meeting theserequirements do not exist and the Skuy et al study isarguably the best South African learning potential study

20 Score gains as low-quality measuresof motivation

As criterion-related validity is strongly dependent ong te Nijenhuis et als finding of lowered g loadings

after training should result in lowered criterion-relatedvalidity However the empirical findings show theopposite virtually all testndashretest and test preparationstudies on cognitive tests and scholastic aptitude tests thatreported both criterion-related validities demonstratesmall to modest increases in criterion-related validity forthe second or third test score (see Allalouf amp Ben-Shakhar 1998 Bashi 1976 Coyle 2006 HausknechtTrevor amp Farr 2002 Jones 1986 Linn 1977 Olsen ampSchrader 1959 Ortar 1960 Powers 1985 Reeve ampLam 2005) In the carefully designed study by Allaloufand Ben-Shakhar (1998) of a university entrance test theexperimental group received an intensive 40-h testcoaching program while the control group did not Thecriterion-related validity for the retest increased for bothgroups Most importantly the increase was the samemdashitwas not larger for the experimental group

In a little-known but carefully designed large-scalelearning potential study by Resing (1990 see Table423) she compared an experimental group thatreceived a pretest a learning potential training and aposttest against a control group that received only thepretest and the posttest The mean criterion-relatedvalidity of the various second scores was 62 for both theexperimental and the control group Learning potentialtraining did not result in incremental criterion-relatedvalidity over and above the validity resulting fromsimply retesting The findings from both Resing andAllalouf and Ben-Shakhar suggest that cognitiveinterventions do not increase criterion-related validitymore than simple retesting

g and the personality measure conscientiousness havebeen shown to make an excellent combination ofpredictors (Schmidt amp Hunter 1998) Conscientiousnessrepresents among other characteristics persistence a willto achieve and the ability to focus effort on the goal Afield study on test preparation using actual job applicants(Clause Delbridge Schmitt Chan amp Jennings 2001)showed that motivation to perform well on the testcorrelated 25 with test performance One could speculatethat score increases do not reflect a true cognitivecomponent but rather become low-quality measures ofmotivation Further since the increase in validity due toretesting and learning potential training is modest incomparison to the large increase obtainable from the useof personality questionnaires personality testing mightprovide a less expensive and more accurate alternative

21 Effectiveness of various training formats

Components of the mediation training used by Skuyet al (2002) are similar to the test training used in te

297J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Nijenhuis et al (2001) Both the Dutch training and theSouth African training took 3 h but whereas in theDutch training the focus was on two different testformats the South African training dealt only with onetest format The test training by Lloyd and Pidgeon(1961) took even less time namely two half-hoursegments each focusing on one test format The effectsizes in all studies were roughly comparable Thissuggests that the methodologies employed by teNijenhuis et al and Lloyd and Pidgeon were moreefficient than those used by Skuy et al It is possible thatthe components of the mediation training that are notpresent in the other two training formats are not effectivein raising test scores and could therefore be left out Iftrue it might be possible to increase the scores on theRSPM by one SD with a relatively simple 1-h training

22 Generalizability of findings

Can these findings of hollow score gains after testndashretest test practice and Mediated Learning ExperienceTraining be generalized to other studies where training-induced score gains were found Ericsson and Lehmann(1996) reported tremendous score increases afterintensive training on numeric memory tests but thesegains did not generalize in the least to verbal memorytests Such gains on one narrow ability do not generalizeto another narrow ability clustering under the samebroad ability and are therefore hollow Similarly Jensen(1998b) showed that score gains due to adoption werenot on the g factor and were therefore most likelyhollow

Rushton (1999) argued that intergenerational scoregains are not linked to g suggesting the Flynn effectsmay be empty but he was strongly criticized by Flynn(1999 2000) In studies on the Flynn effect score gainsfound in cross-sectional studies are largest on the RSPM(Flynn 1987) It has been suggested by Lynn (1998) thata substantial part of these intergenerational score gainson the RSPM are generalizablendashthey do reflect highergndashbut the remaining part is hollow and should beinterpreted as schooling effects The RSPM does requirethe application of the mathematical principles ofaddition subtraction progression and the distributionof values In the three decades (1950sndash1980s) overwhich these increases in RSPM scores have occurredincreasing proportions of 15- to 18-year-olds haveremained in schools where they have learned mathskills that they have applied to the solution of matricesproblems Our findings could be interpreted as supportfor Lynns hypothesis of the partial hollowness of scoregains on the RSPM Notwithstanding the high g loading

of the sum score of the RSPM it is quite sensitive totestndashretest effects and training effects Some studies onthe Flynn effect (Lynn amp Hampson 1986 Teasdale ampOwen 1989) show that the increase in scores is largelyconcentrated in the lower segments of the IQ distribu-tion Our finding that low scorers show the largest gainsafter training may additionally support the notion that apart of the Flynn effect on the RSPM is hollow FinallyWicherts et als (2004) findings show that in some oftheir datasets the secular score gains are most stronglylinked to broad- narrow- and test-specific abilitiesshowing that an important part of the gains are non-generalizable

Ceci (1991) showed that increased schooling leads tohigher IQ scores but are these gains highly specific orpredominantly generalizable It would be interesting toapply the techniques we used in this study to thefindings from previous intervention studies It may bethat biological interventions (such as diet vitaminsupplements vaccination against infectious disease)rather than psychological or educational interventionsare the most cost-effective method of producing truechanges in g and broad abilities It may be that there is abiological barrier between the first stratum and thesecond stratum that restricts the effects of behavioralinterventions to narrow abilities and test specificities

Acknowledgement

We like to thank Mervyn Skuy for his permission touse his dataset

Thanks to Marieacute de Beer Raegan Murphy WelkoTomic Art Jensen and Frank Schmidt for feedback onprevious versions of this paper

Thanks to Arne Evers Wilma Resing (Dutch TestCommittee) and Andress Kooij (Harcourt) for alsohelping in locating testndashretest studies

References

Ackerman P L (1986) Individual differences in informationprocessing An investigation of intellectual abilities Intelligence10 101minus139

Ackerman P L (1987) Individual differences in skill learning Anintegration of psychometric and information processing skillsPsychological Bulletin 102 3minus27

Allalouf A amp Ben-Shakhar G (1998) The effect of coaching on thepredictive validity of scholastic aptitude tests Journal ofEducational Measurement 35(1) 31minus47

Ashton M C amp Lee K (2005) Problems with the method ofcorrelated vectors Intelligence 33 431minus444

Bashi Y (1976) Verbal and non-verbal abilities of 4th 6th and 8thgrade students in the Arab educational system in Israel JerusalemHebrew University School of Education

298 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Bleichrodt N Resing W C M Drenth P J D amp Zaal J N (1987)Intelligentie-meting bij kinderen Empirische en methodologischeverantwoording van de geReviseerde Amsterdamse Kinder Intelli-gentie Test [Measuring the intelligence of children Empirical andmethodological justification of the Revised Amsterdam ChildrenIntelligence Test] Lisse the Netherlands Swets

Bennett G K Seashore H G ampWesman A G (1974)DifferentialAptitude Tests (5th ed) Manual New York The PsychologicalCorporation

Boeyens J C A (1989) Learning potential An empiricalinvestigation Pretoria South Africa Human Science ResearchCouncil

Bosch F (1973) Inventarisatie beschrijving en onderzoek mbt dewijzigingen van de GATB incl test-hertest onderzoek (NoPz3bRp0120) [Stock-taking description and research concern-ing the modifications of the GATB includes testndashretest study]Utrecht the Netherlands Nederlandse Spoorwegen

Carroll J B (1993) Human cognitive abilities A survey of factoranalysis studies Cambridge University Press

Ceci S J (1991) How much does schooling influence generalintelligence and its cognitive components A reassessment of theevidence Developmental Psychology 27 703minus722

Christian K Bachnan H J amp Morrison F J (2001) Schooling andcognitive development In R J Sternberg amp E L Grigorenko(Eds) Environmental effects on cognitive abilities (pp 287minus335)Mahwah NJ Erlbaum

Clause C S Delbridge K Schmitt N Chan D amp Jennings D(2001) Test preparation activities and employment test perfor-mance Human Performance 14 149minus167

Cohen J (1988) Statistical power analysis for the behavioralsciences Hillsdale Lawrence Erlbaum

Colom R Jung R E amp Haier R J (in press) Finding the g-factor inbrain structure using the method of correlated vectors Intelligence

Covin T A (1977) Stability of the WISC-R for 9-year-olds withlearning difficulties Psychological Reports 40 1297minus1298

Coyle T R (2006) Testndashretest changes on scholastic aptitude tests arenot related to g Intelligence 34 15minus27

Cronbach L J (1990) Essentials of psychological testing New YorkHarperCollins

de Villiers AB (1999) Disadvantaged students academic perfor-mance Analysing the zone proximal developmentUnpublished DPhil thesis University of Cape Town South Africa

de Wolff C J amp Buiten B (1963) Een factoranalyse van viertestbatterijen [A factor analysis of four test batteries] NederlandsTijdschrift Voor Psychologie 18 220minus239

Dolan C V amp Lubke G (2001) Viewing Spearmans hypothesisfrom the perspective of multigroup PCA A comment onSchonemanns criticism Intelligence 29 231minus245

Drenth P J D Petrie J F amp Bleichrodt N (1968) Handleiding bijde Amsterdamse Kinder Intelligentie Test [Manual of theAmsterdam Children Intelligence Test] Amsterdam VrijeUniversiteit

Elliott C D (1983) British Ability Scales Manual 2 TechnicalHandbook Windsor Great-Britain NFER-Nelson

Engelbrecht M (1999) Leerpotensiaal as voorspeller van akademi-ese sukses van universiteitsstudente [Learning potential aspredictor of the academic success of university students]Unpublished D Phil thesis Potchefstroom University forChristian Higher Education South Africa

Ericsson K A amp Lehmann A C (1996) Expert and exceptionalperformance Evidence of maximal adaptation to task constraintsAnnual Review of Psychology 47 273minus305

Evers A amp Lucassen W (1991) Handleiding DAT 83 DifferentieumlleAanleg Testserie [Manual DAT83 Differential Aptitude Testseries] Amsterdam Swets

Fleishman E A amp Hempel W E (1955) The relation betweenabilities and improvement with practice in a visual discriminationreaction task Journal of Experimental Psychology 49 301minus312

Flynn J R (1987) Massive IQ gains in 14 nations What IQ testsreally measure Psychological Bulletin 101 171minus191

Flynn J R (1999) Evidence against Rushton The genetic loading ofWISC-R subtests and the causes of between-group IQ differencesPersonality and Individual Differences 26 373minus379

Flynn J R (2000) IQ gains WISC subtests and fluid g g theory andthe relevance of Spearmans hypothesis to race In G R B JGoode (Ed) The nature of intelligence (pp 202minus227) New YorkWiley

Gaydon VP (1988) Predictors of performance of disadvantagedadolescents on the SowetoAlexandra gifted child programmeUnpublished M Ed dissertation University of the WitwatersrandSouth Africa

Gottfredson L S (1997) Why g matters The complexity of everydaylife Intelligence 24(1) 79minus132

Gottfredson L S (2002) g Highly general and highly practical In RJ Sternberg amp E L Grigorenko (Eds) The general intelligencefactor How general is it (pp 331minus380) Mahwah NJ Erlbaum

Grigorenko E L amp Sternberg R J (1998) Dynamic testing Psy-chological Bulletin 124 75minus111

HaeckW Yeld N Conradie J Robertson N amp Shall A (1997) Adevelopmental approach to mathematics testing for universityadmissions and course placement Educational Studies in Mathe-matics 33 71minus91

Hartmann P Kruuse NHS amp Nyborg H (in press) Testing thecross-racial generality of Spearmans hypothesis in two samplesIntelligence

Hausknecht J P Trevor C O amp Farr J L (2002) Retaking abilitytests in a selection setting Implications for practice effects trainingperformance and turnover Journal of Applied Psychology 87(2)243minus254

Hunter J E amp Schmidt F L (1990) Methods of meta-analysisLondon Sage

Hunter J E amp Schmidt F L (2004) Methods of meta-analysis (2nded) London Sage

Jensen A R (1980) Bias in mental testing London MethuenJensen A R (1985) The nature of the blackndashwhite difference on

various psychometric tests Spearmans hypothesis Behavioraland Brain Sciences 8 193minus263

Jensen A R (1998a) The g factor The science of mental abilityLondon Praeger

Jensen A R (1998b) Adoption data and two g-related hypothesesIntelligence 25 1minus6

Johnson W Bouchard T J Krueger R F Jr McGue M ampGottesman I I (2004) Just one g Consistent results from threetest batteries Intelligence 32 95minus107

Johnson W te Nijenhuis J amp Bouchard TJ Jr (in press)Replication of the hierarchical visual-perceptual-image rotationmodel in de Wolff and Buitens (1963) battery of 46 tests of mentalability Intelligence

Jones R J (1986) A comparison of the predictive validity of theMCAT for coached and uncoached students Journal of MedicalEducation 61 335minus338

Kaufman A S amp Kaufman N L (1983) K-ABC KaufmanAssessment Battery for Children Interpretive manual CirclePines MN AGS

299J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Kooij A P Rolfhus E Wilkins C Yang Z amp Zhu J (2005)WAIS-III Nederlandstalige bewerking Technisch rapport hernor-mering [WAIS-III adoptation in Dutch Technical report renorm-ing] Amsterdam Harcourt

Kort W Schittekatte M Dekker P H Verhaeghe P Compaan EL Bosmans M amp Vermeir G (2005) WISC-IIINL WechslerIntelligence Scale for Children Derde Editie NL Handleiding enverantwoording [The Dutch WISC-III Wechsler Intelligence Scalefor Children Third Edition for the Netherlands Manual andjustification] Amsterdam NIP

Kulik J A Bangert-Drowns R L amp Kulik C C (1984)Effectiveness of coaching for aptitude tests PsychologicalBulletin 95 179minus188

Kulik J A Kulik C C amp Bangert R L (1984) Effects of practiceon aptitude and achievement test scores American EducationalResearch Journal 21 435minus447

Lee K H Choi Y Y Gray J R Cho S H Chae J -H Lee S etal (2006) Neural correlates of superior intelligence Strongerrecruitment of posterior parietal cortex NeuroImage 29(2)578minus586

Linn R L (1977) On the treatment of multiple scores for LawSchool Admission Test repeaters (Report LSAC-77-4) In LawSchool Admission Council Reports of LSAC Sponsored ResearchVolume III 1975-1977 Princeton NJ Law School AdmissionCouncil

Lipson LE (1992) Relationship of static and dynamic measures toscholastic achievement of black pupils Unpublished MEddissertation University of Witwatersrand South Africa

Lloyd F amp Pidgeon D A (1961) An investigation into the effects ofcoaching on non-verbal test material with European Indian andAfrican children British Journal of Educational Psychology 31145minus151

Luteijn F amp Barelds D P H (2005) GIT2 Groninger IntelligentieTest 2 [GIT2 Groningen Intelligence Test 2] Amsterdam Harcourt

Lynn R (1990) The role of nutrition in secular increases inintelligence Personality and Individual Differences 11 273minus285

Lynn R (1998) In support of the nutrition theory In U Neisser(Ed) The rising curve Long-term gains in IQ and relatedmeasures (pp 207minus215) Washington DC American Psycholo-gical Association

Lynn R Allik J amp Irwing P (2004) Sex differences on three factorsidentified in Ravens Standard Progressive Matrices Intelligence32 411minus424

Lynn R amp Hampson S (1986) The rise of national intelligenceEvidence from Britain Japan and the USA Personality andIndividual Differences 7 23minus32

Matarazzo J D Carmody T P amp Jacobs L D (1980) Testndashretestreliability and stability of the WAIS A literature review withimplications for clinical practice Journal of Clinical Neuropsy-chology 2(2) 89minus105

McCormick BK Dunlap WP Kennedy RS amp Jones MB(1983) The effects of practice on the Armed Forces VocationalAptitude Test Battery US Army Research Institute for theBehavioral and Social Sciences Technical Report 602

Mulder J L Dekker R amp Dekker P H (2004) KaufmanIntelligentietest voor adolesecenten en volwassenen (KAIT)Handleiding [Kaufman Intelligence test for adolescents and adults(KAIT) Manual] Leiden the Netherlands PITS

Murphy R (2002) A review of South African research in the fieldof dynamic assessment Unpublished MA dissertation Universityof Pretoria (available online from httpupetdupaczathesisavailableetd-05042002-161239)

Nel A (1997)Die voorspelling van akademiese sukses binne kontekstvan n alternatiewe universiteitstoelatingsbeleid [The predictionof academic success within the context of an alternative policy ofuniversity admission] Unpublished MA dissertation RandAfrikaans University South Africa

Neubauer A C amp Freudenthaler H H (1994) Reaction time in asentence-picture verification test and intelligence Individualstrategies and effects of extended practice Intelligence 19193minus218

Nunnally J C amp Bernstein I H (1994) Psychometric theory(3rd ed) New York McGraw-Hill

Olsen M amp Schrader W B (1959) The use of preliminary and finalScholastic Aptitude Test scores in predicting college grades(College Entrance Examination Board Research and DevelopmentReports and Statistical Reports 59-19 Princeton NJ Educa-tional Testing Service

Ortar G R (1960) Improving test validity by coaching EducationalResearch 2 137minus142

Powers D E (1985) Effects of test preparation on the validity ofGraduate Admission Test Applied Psychological Measurement 9179minus190

Raven J Raven J C amp Court J H (2000) Standard ProgressiveMatrices Raven manual Section 3 Oxford Psychologists Press

Ree M J amp Carretta T R (1994) The correlation of generalcognitive ability and psychomotor tracking tests InternationalJournal of Selection and Assessment 2 209minus216

Ree M J amp Earles A A (1991) Predicting training success Notmuch more than g Personnel Psychology 44 321minus332

Ree M J Earles J A amp Teachout M S (1994) Predicting jobperformance Not much more than g Journal of AppliedPsychology 79 518minus524

Reeve C L amp Lam H (2005) The psychometric paradox of practiceeffects due to retesting Measurement invariance and stable abilityestimates in the face of observed score changes Intelligence 33535minus549

Resing W C M (1990) Intelligentie en leerpotentieel Eenonderzoek naar het leerpotentieel van jonge leerlingen uit hetbasis-en speciaal onderwijs [Intelligence and learning potential Astudy into the learning potential of young students in basic andspecial education] Amsterdam the Netherlands Swets

Rushton J P (1999) Secular gains in IQ are not related to the g factorand inbreeding depressionmdashunlike blackndashwhite differences A replyto Flynn Personality and Individual Differences 26 381minus389

Rushton J P Skuy M amp Bons T A (2004) Construct validity ofRavens Advanced Progressive Matrices for African and non-African engineering students in South Africa InternationalJournal of Selection and Assessment 12(3) 220minus229

Schmidt F L amp Hunter J E (1998) The validity and utility ofselection methods in personnel psychology Practical and theore-tical implications of 85 years of research findings PsychologicalBulletin 124(2) 262minus274

Schmidt F L amp Hunter J E (1999) Theory testing andmeasurement error Intelligence 27(3) 183minus198

Schmidt F L amp Le H (2004) Software for the Hunter-Schmidtmeta-analysis methods University of Iowa Department ofManagement and Organization IOWA City IQ 42242

Schroots J J F amp van Alphen de Veer R J (1979) LDT LeidseDiagnostische Test Deel 1 Handleiding [LDT Leiden DiagnosticTest Part 1 Manual] Lisse the Netherlands Swets

Shochet I M (1986) Manifest and potential performance inadvantaged and disadvantaged students Unpublished DPhildissertation University of the Witwatersrand South Africa

300 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Skuy M Gewer A Osrin Y Khunou D Fridjon P amp RushtonJ P (2002) Effects of mediated learning experience on RavensMatrices scores of African and non-African university studentsin South Africa Intelligence 30 221minus232

Swanson H E amp Lussier C M (2001) A selective synthesis of theexperimental literature on dynamic assessment Review of Educa-tional Research 71 321minus363

Teasdale T W amp Owen D R (1989) Continuing secular increase inintelligence and a stable prevalence of high intelligence levelsIntelligence 13 255minus262

Tuma J M amp Appelbaum A S (1980) Reliability and practiceeffects of WISC-R IQ estimates in a normal population Educa-tional and Psychological Measurement 40 671minus678

te Nijenhuis J Tolboom E Resing W amp Bleichrodt N (2004)Does cultural background influence the intellectual performance ofchildren from immigrant groups Validity of the RAKITintelligence test for immigrant children European Journal ofPsychological Assessment 20 10minus26

te Nijenhuis J amp van der Flier H (1997) Comparability of GATBscores for immigrants and majority group members Some Dutchfindings Journal of Applied Psychology 82 675minus687

te Nijenhuis J Voskuijl O F amp Schijve N B (2001) Practice andcoaching on IQ tests Quite a lot of g International Journal ofSelection and Assessment 9 302minus308

Thorndike R L (1985) The central role of general ability inprediction Multivariate Behavioral Research 20 241minus254

Tucker L R Damarin F amp Messick S (1966) A base-free measureof change Psychometrika 31(4) 457minus473

van der Doef M P Kwint J M amp van der Koppel (1989) Wat lerenmoeilijk lerende kinderen van de WISC-R [What do children whohave difficulties in learning learn from the WISC-R] Kind enAdolescent 10 136minus141

United States Department of Labor (1970) Manual for the USTESGeneral Aptitude Test Battery Section III DevelopmentWashing-ton DC United States Department of Labor

van Geffen (1972) De betrouwbaarheid van de GATB 1002-B opbrugklasniveau [The reliability of the GATB 1002 B for the firstclass at secondary school] Catholic University Nijmegen theNetherlands Psychology of Work and Organisation

van Haasen P P de Bruyn E E J Pijl Y J Poortinga Y H LutjeSpelberg H C Vander Steene G et al (1986) WISC-RWechsler Intelligence Scale for Children-Revised Nederlandsta-lige uitgave [WISC-R Wechsler Intelligence Scale for Children-Revised Dutch edition] Lisse the Netherlands Swets

Wechsler D (1955) Manual for the Wechsler Adult IntelligenceScale New York The Psychological Corporation

Wechsler D (1967)Manual for the Wechsler Preschool and PrimaryScale of Intelligence New York The Psychological Corporation

Wechsler D (1974) Manual for the Wechsler Intelligence Scale forChildren-Revised New York The Psychological Corporation

Wechsler D (1981) WAIS-R manual Wechsler Adult IntelligenceScale-Revised New York The Psychological Corporation

Wechsler D (1997)WAIS-III Wechsler Adult Intelligence Scale-thirdedition and WMS-III Wechsler Memory Scale-third editionTechnical manual New York The Psychological Corporation

Wicherts J W Dolan C V Oosterveld P van Baal G C VBoomsma D I amp Span M M (2004) Are intelligence testsmeasurement invariant over time Investigating the nature of theFlynn effect Intelligence 32(5) 509minus537

Yeld N amp Haeck W (1997) Educational histories and academicpotential Can tests deliver Assessment and Evaluation in HigherEducation 22 5minus16

Zaaiman H (1998) Selecting students for Mathematics and ScienceThe challenge facing higher education in South Africa SouthAfrica Pretoria HSRC Publishers

Zaaiman H van der Flier H amp Thijs G D (2001) Dynamic testingin selection for an educational programme Assessing SouthAfrican performance on the Raven Progressive Matrices Inter-national Journal of Selection and Assessment 9 258minus269

Zolezzi S A (1992) Alternative selection measures for universityundergraduate admissions Unpublished MEd dissertation Uni-versity of the Witwatersrand South Africa

Zolezzi S A (1995) The effectiveness of dynamic assessment as analternative aptitude testing strategy Unpublished DPhil disserta-tion University of South Africa South Africa

285J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

(2001) showed that after various forms of test prepara-tion the g loadedness of their test battery decreased from53 to 49 Based on the work of Ackerman (19861987) it can be concluded that through practice oncognitive tasks part of the performance becomesoverlearned and automatic the performance requiresless controlled processing of information which isreflected in lowered g loadings

4 Second test of Jensens hypothesis studies onpractice and coaching

Three studies on practice and coaching have shownincreases in test scores that are not related to the g factorThis suggests that the gains are lsquoemptyrsquo or lsquohollowrsquo Inthe first study Jensen (1998a ch 10) analyzed the effectof practice on the General Aptitude Test Battery(GATB) He found negative correlations ranging fromminus 11 to minus 86 between effect sizes on practice and thetests g loadings Therefore the gains were largest onthe least cognitively complex tests In the second studyte Nijenhuis et al (2001) found a small correlation ofminus 08 for test practice and large negative correlations ofminus 87 for both of their two test coaching conditionsJensen carried out a factor analysis of the various GATBscore gains and found two large factors that did notcorrelate with the g factor extracted from the GATBMost likely the score gains are not on the g factor or thebroad abilities but on the test specificities since teNijenhuis et al showed that practice and coachingreduce the g-loadedness of their tests In a third study(Coyle 2006) factor analysis demonstrated that thechange in aptitude test scores had a zero loading on theg factor

So the studies on practice and coaching appear tosupport the theory However since there are only a fewempirical studies that have tested the link (or absencethereof) between gains in test score from practice andcoaching and g loadings replications are required beforethe conclusion can be firmly established Therefore wecombined several such studies with various DutchBritish and American test batteries into a meta-analysis

5 Third test of Jensens hypothesis studies onlearning potential

Jensen hypothesizes that the effects of training arenot on g but that the gains are empty and trainingshould therefore not lead to increased predictivevalidity Based on learning potential theory one wouldcome to an opposite prediction namely that trainingleads to higher predictive validity The fact that the

theoretical framework of learning potential does notinclude the g factor is of no importance here we solelyfocus on a prediction based on learning potential theorythat is opposite to a prediction based on Jensens theorybased on a hierarchical intelligence model Somelearning potential training studies report predictivevalidities of pre- and posttest scores Based on Jensenstheory one would predict (1) no higher predictivevalidity for learning potential tests in comparison withclassical cognitive tests and (2) no increase in pre-dictive validity due to training when using posttestscores instead of pretest scores However based onlearning potential theory one would predict a sub-stantial increase in predictive validity in both cases Sostudies on learning potential constitute a test ofJensens hypothesis

A large number of studies have been carried out tocheck for learning potential beyond IQ scores generallyshowing that scores go up substantially after mediationApart from theoretical considerations dynamic testsshould show higher criterion-related validities thanclassical IQ tests to justify the time-consuming proce-dure Based on a lengthy review of most of the literatureGrigorenko and Sternberg (1998) concluded that theempirical data do not consistently show higher pre-dictive power of dynamic tests compared with tradi-tional tests Murphy (2002) did an excellent and detailedreview of all South African studies on learning potentialincluding virtually all missed by Grigorenko andSternberg (probably due to difficulty of access) Manystudies (Boeyens 1989 de Villiers 1999 Engelbrecht1999 Gaydon 1988 Haeck Yeld Conradie Robert-son amp Shall 1997 Lipson 1992 Nel 1997 Shochet1986 Skuy et al 2002 Yeld amp Haeck 1997 Zaaiman1998 Zaaiman van der Flier amp Thijs 2001 Zolezzi1992 1995) used data from the numerous South Africanuniversity entrance programs that had adopted adynamic framework for assessing disadvantaged under-prepared students The aim of these programs was togive underprepared applicants an optimal chance toprove that they have the ability to succeed with furtherstudy Again it was found that while some South Africanstudies show higher criterion-related validities forlearning potential tests the effect was not consistent

However in these studies the learning potential testswere compared against individual tests or anunweighted combination of a limited number of testsbut generally not against a full test battery and in no caseagainst g scores g scores have been shown to yieldhigher predictive validities than individual tests or anunweighted score sum (Ree amp Earles 1991 Ree et al1994 Thorndike 1985) So these were comparisons

286 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

where the cognitive predictor with the highest predictivevalidity was not used but where the dynamic tests werepitted against predictors with substantially lower pre-dictive validities than g As no direct comparisons weremade between learning potential tests and g it is notpossible to draw the conclusion that g had higherpredictive validity However since a comparison of alearning potential test with one test or a combination of alimited number of tests generally results in comparablepredictive validities and g scores clearly have higherpredictive validities than one test or a combination of alimited number of tests it not unlikely that a g score willhave a higher predictive validity than a learningpotential test score This also suggests that the findingsmight best be interpreted as tentative support forJensens theory

So the studies on learning potential appear to supportthe theory that score gains can be summarized in thehierarchical intelligence model However more directtests of the theory are required and therefore a learningpotential study was reanalyzed

6 Research questions

The research question of this study is whether scoregains from testndashretest studies and mediated interven-tions can be summarized in terms of Carrolls three-stratum hierarchical intelligence model We examinedwhether (1) correlations between score gains and the gloadedness of the scores are negative in sign (2) the gloadedness of scores decreases after mediation and (3)low-g persons show the largest gains after the mediationtraining We carried out a meta-analysis to be able toprovide a convincing answer to the first researchquestion In a more explorative study on learningpotential in South Africa we tried to find support for allthree research questions

7 Testndashretest studies

To test whether there is a negative correlationbetween g loading of tests and score gains we carriedout a meta-analysis of all testndashretest studies of DutchBritish and American test batteries available in theNetherlands All studies were simple practice studiesndashno intervention such as additional coaching took placendashand used well-validated tests

8 Method

Psychometric meta-analysis (Hunter amp Schmidt1990) aims to estimate what the results of studies

would have been if all studies had been conductedwithout methodological limitations or flaws The resultsof perfectly conducted studies would allow a lessobstructed view of the underlying construct-levelrelationships (Schmidt amp Hunter 1999) One of thegoals of the present meta-analysis is to have a reliableestimate of the true correlation between standardizedtestndashretest score gains (d) and g Although the constructof g has been thoroughly studied the construct under-lying score gains is less well understood One of theaims of the present study is to have a clearerunderstanding of the construct underlying score gainsby linking it to the g nexus Carrying out a completemeta-analysis on the relationship between d and gwould require the collection of a very large number ofdatasets However applying meta-analytical techniquesto a sufficiently large number of studies will also lead toa reliable estimate of the true correlation between d andg We therefore collected a large number of studiesheterogeneous across various possible moderators

To get a reliable correlation between g and d wefocused on batteries with a minimum of seven subtestsLibraries and test libraries of universities were searchedand several members of the Dutch Testing Commissionand test publishers were contacted We limited ourselvesto non-clinical samples without health problems Only aminority of test manuals report testndashretest studiesespecially before 1970 they are rare The search yieldedvirtually all testndashretest studies available in the Nether-lands The GATB manual (1970 ch 20) reports verylarge datasets on secondary school children who tookthe GATB with respectively 1- 2- and 3-year intervalsAt the time of the first test large samples of children thathad the same age as the testndashretest children at the time ofthe second test also took the test Through a comparisonof the scores the maturation effects could be separatedfrom the testndashretest effects so we included the data inthe present study

Standardized score gains were computed by dividingthe raw score gain by the SD of the pretest In generalg loadings were computed by submitting a correlationmatrix to a principal axis factor analysis and using theloadings of the subtests on the first unrotated factor Insome cases g loadings were taken from studies whereother procedures were followed these procedures havebeen shown empirically to lead to highly comparableresults Pearson correlations between the standardizedscore gains and the g loadings were computed

Psychometric meta-analytical techniques (Hunter ampSchmidt 1990 2004) were applied to the resulting 64rgds using the software package developed by Schmidtand Le (2004) Psychometric meta-analysis is based on

287J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

the principle that there are artifacts in every dataset andthat most of these artifacts can be corrected In thepresent study we corrected for five artifacts that alter thevalue of outcome measures listed by Hunter andSchmidt (1990) (1) sampling error (2) reliability ofthe vector of g loadings (3) reliability of the vector ofscore gains (4) restriction of range of g loadings and(5) deviation from perfect construct validity

81 Correction for sampling error

In many cases sampling error explains the majorityof the variation between studies so the first step in apsychometric meta-analysis is to correct the collectionof effect sizes for differences in sample size between thestudies

82 Correction for reliability of the vector of g loadings

The value of rgd is attenuated by the reliability of thevector of g loadings for a given battery When twosamples have a comparable N the average correlationbetween vectors is an estimate of the reliability of eachvector The collection of datasets in the present studyincluded no g vectors for the same battery from differentsamples and therefore artifact distributions were basedupon other studies reporting g vectors for two or moresamples So the effect sizes and the distribution ofreliabilities of the g vector were based upon differentsamples When two g vectors were compared thecorrelation between them was used and when morethan two g vectors were compared the averagecorrelation for the various combinations of two vectorswas used The combined N from the samples on whichthe g vector was based was taken as the weight of onedata point

Several samples were compared that differed little onbackground variables For the comparisons usingchildren we chose samples that were highly comparablewith regard to age and for the comparisons of adults wechose samples that were roughly comparable withregard to age In a study on young children Schrootsand van Alphen de Veer (1979) report correlationmatrices for the Leidse Diagnostische Test for eight agegroups between 4 and 8 years of age The averagecorrelation between the adjacent age groups is 75(combined N=1169) Several studies report data onboth younger and older children The DutchFlemishWISC-R (van Haasen et al 1986) has samples withcomparable N of Dutch and Flemish children so the 11age groups between 6 and 16 could be compared Thisresulted in an average correlation of 78 (combined

N=3018) Jensen (1985) reports g loadings of the 12subtests of the WISC-R obtained in three largeindependent representative samples of Black andWhite children The average correlation between the gvectors obtained for each sample is 86 for the Blackchildren (combined N=1238) and 93 for the Whitechildren (combined N=2868) In a study on olderchildren Evers and Lucassen (1991) report the correla-tion matrices of the Dutch DAT The average correlationbetween the g vectors of three educational groups is 88(combined N=3300) The US GATB manual (1970chapter 20) gives correlation matrices for large groupsof boys and girls in secondary school The averagecorrelation between the g vectors of the same-age boysand girls is 97 (combined N=26708) Several studiesreport data on adults g loadings of the eight subtests ofthe GATB are reported by te Nijenhuis and van der Flier(1997) for applicants at Dutch Railways and by de Wolffand Buiten (1963) for seamen at the Royal Dutch Navyresulting in a correlation of 90 (combined N=1306)The US GATB manual (1970) gives correlation matricesfor two large groups of adults which yields a correlationbetween g vectors of 94 (combined N=4519) JohnsonBouchard Krueger McGue and Gottesman (2004)report g loadings for a sample that took the WAIS andWechsler (1955) reports the correlation matrices of theWAIS for adults of comparable age so g loadings couldbe computed The correlation between the g vectors forthe two studies is 72 (combined N=736) So it appearsthat g vectors are quite reliable especially when thesamples are very large

The number of tests in the batteries in the presentstudy varied from 7 to 14 The number of tests does notnecessarily influence the size of rgd but clearly has aneffect upon its variability Because variability in thevalues of the artifacts influences the amount of varianceartifacts explain in observed effect sizes we estimatedthis variability using data from the samples described inthe previous paragraph

83 Correction for reliability of the vector of scoregains

The value of rgd is attenuated by the reliability ofthe vector of score gains for a given battery Whentwo samples have a comparable N the average cor-relation between vectors is an estimate of thereliability of each vector The reliability of the vectorof score gains was estimated using the presentdatasets comparing samples that took the same testand that differed little on background variables Forthe comparisons using children we choose samples

288 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

that were highly comparable with regard to age andfor the comparisons of adults we choose samples thatwere roughly comparable with regard to age

In the GATB manual (1970 ch 15) 13 combinationsof two studies are described where large samples of menand women that are comparable with respect to age andbackground took the same GATB subtests The averageunweighted correlation between the d vectors of menand women is 83 (total N=3760) In the GATB manual(1970 ch 20) three combinations of three studies aredescribed where very large samples of boys and girlsthat are in the same grade in secondary school took thesame GATB subtests This yielded correlations betweenthe d vectors of respectively 99 98 and 94 (totalN=20541) Together van Geffen (1972) and Bosch(1973) report three Dutch GATB testndashretest studies onchildren in secondary school resulting in threecomparisons between d vectors The average N-weighted correlation between the d vectors is 47 (totalN=127) Vectors of score gains from two differentdatasets on the WISC-R were compared Tuma andAppelbaum (1980) tested children with an average ageof 10 and Wechsler (1974) tested 10- and 11-year-oldsThe correlation between the two d vectors is 71 (totalN=147) Comparison of vectors of score gains fromdatasets on the DAT (Bennett Seashore amp Wesman1974) resulted in correlations of respectively 78 and73 so an average r of 76 (total N=254) So it appearsthat d vectors are quite reliable especially when thesamples are very large We estimated the reliabilities ofthe d vectors in the database using data from thesamples described in this paragraph

84 Correction for restriction of range of g loadings

The value of rgd is attenuated by the restriction ofrange of g loadings in many of the standard testbatteries The most highly g-loaded batteries tend tohave the smallest range of variation in the subtests gloadings Jensen (1998a pp 381ndash382) shows thatrestriction in g loadedness strongly attenuates thecorrelation between g loadings and standardized groupdifferences Hunter and Schmidt (1990 pp 47ndash49)state that the solution to range variation is to define areference population and express all correlations interms of that reference population The Hunter andSchmidt meta-analytical program computes what thecorrelation in a given population would be if thestandard deviation were the same as in the referencepopulation The standard deviations can be comparedby dividing the study population standard deviation bythe reference group population standard deviation that

is u=SDstudySDref As the reference we took thetests that are broadly regarded as exemplary for themeasurement of the intelligence domain namely thevarious versions of the Wechsler tests for childrenThe average standard deviation of g loadings of thevarious Dutch and US versions of the WISC-R andthe WISC-III was 0128 So the SD of g loadings ofall test batteries was compared to the average SD ing loadings in the Wechsler tests for children Thisresulted in some batteriesndashsuch as the GATBndashhavinga value of u larger than 100

85 Correction for deviation from perfect constructvalidity

The deviation from perfect construct validity in gattenuates the value of rgd In making up any collectionof cognitive tests we do not have a perfectly repre-sentative sample of the entire universe of all possiblecognitive tests So any one limited sample of tests willnot yield exactly the same g as any other limitedsample The sample values of g are affected by psy-chometric sampling error but the fact that g is verysubstantially correlated across different test batteriesimplies that the differing obtained values of g can allbe interpreted as estimates of a ldquotruerdquo g The value ofrgd is attenuated by psychometric sampling error ineach of the batteries from which a g factor has beenextracted

The more tests and the higher their g loadings thehigher the g saturation of the composite score TheWechsler tests have a large number of subtests withquite high g loadings resulting in a highly g-saturatedcomposite score Jensen (1998a pp 90ndash91) states thatthe g score of the Wechsler tests correlate more than 95with the tests IQ score However shorter batteries witha substantial number of tests with lower g loadings willlead to a composite with a somewhat lower g saturationJensen (1998a ch 10) states that the average g loadingof an IQ score as measured by various standard IQ testsis in the +80 s When we take this value as an indicationof the degree to which an IQ score is a reflection ofldquotruerdquo g we can estimate that a tests g score correlatesabout 85 with ldquotruerdquo g As g loadings are thecorrelations of tests with the g score it is most likelythat most empirical g loadings will underestimate ldquotruerdquog loadings so empirical g loadings correlate about 85with ldquotruerdquo g loadings As the Schmidt and Le computerprogram only includes corrections for the first fourartifacts the correction for deviation from perfectconstruct validity was carried out on the value of rgdafter correction for the first four artifacts To limit the

Table 1Dutch British and US studies of correlations between g loadings and gain scores

Reference Test r N Information

Drenth et al (1968) AKIT minus 57 100 Primary-school childrenvan Geffen (1972) GATB minus 45 42 Secondary-school children

minus 21 42Bosch (1973) GATB minus 07 43 Secondary-school childrenSchroots and van Alphen

de Veer (1979)LDT minus 42 96 Pre-school and secondary-school children

Bleichrodt et al (1987) RAKIT 09 49 Pre-school childrenminus 25 51 Primary-school childrenminus 21 49 Primary-school children

van der Doef et al (1989) WISC-R minus 69 22 Primary-school children with learning problemsMulder et al (2004) KAIT minus 23 46 Secondary-school children+young adults

minus 42 25 AdultsKort et al (2005) WISC-III minus 15 42 Primary-school children

minus 26 67 Primary-school childrenminus 46 39 Secondary-school children

Luteijn and Barelds (2005) GIT2 minus 51 44 AdultsKooij et al (2005) WAIS-III minus 63 60 AdultsElliott (1983) BAS minus 65 60 Primary-school childrenWechsler (1967) WPPSI minus 46 50 Pre-school childrenUnited States Department

of Labor (1970)GATB minus 35 156 Office applicants

minus 66 605 Male high school seniorsminus 70 554 Female high school seniorsminus 58 223 Males 1-day intervalminus 41 186 Females 1-day intervalminus 50 202 Males 2-week intervalminus 52 152 Females 2-week intervalminus 67 156 Males 6-week intervalminus 61 168 Females 6-week intervalminus 43 176 Males 13-week interval02 149 Females 13-week interval

minus 62 157 Males 26-week intervalminus 32 136 Females 26-week intervalminus 69 119 Males 1-year intervalminus 31 183 Females 1-year intervalminus 96 118 Males 2-year intervalminus 75 170 Females 2-year intervalminus 75 123 Males 3-year intervalminus 48 183 Females 3-year intervalminus 92 3398 Boys secondary schoolminus 92 3680 Girls secondary schoolminus 91 3348 Boys secondary schoolminus 91 3491 Girls secondary schoolminus 84 3229 Boys secondary schoolminus 87 3395 Girls secondary school

Wechsler (1974) WISC-R minus 48 97 Primary-school childrenminus 66 102 Primary-school childrenminus 21 104 Secondary-school children

Bennett et al (1974) DAT minus 79 92 Boys secondary schoolminus 53 81 Girls secondary schoolminus 29 81 Boys secondary schoolminus 62 100 Girls secondary school

Covin (1977) WISC-R minus 57 30 Primary-school children with learning problemsTuma and Appelbaum (1980) WISC-R minus 08 45 Primary- and secondary-school childrenMatarazzo et al (1980) WAIS minus 10 29 Young malesWechsler (1981) WAIS-R minus 64 71 Adults

minus 48 48 Adults

(continued on next page)

289J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Table 1 (continued)

Reference Test r N Information

McCormick et al (1983) ASVAB minus 73 57 adultsKaufman and Kaufman (1983) K-ABC minus 27 46 Pre-school children

minus 18 36 Pre- and primary-school childrenminus 22 70 Primary-school children

Wechsler (1997) WAIS-III minus 45 100 Young adultsminus 57 102 Adultsminus 51 104 Adults03 88 Adults

Reeve and Lam (2005) EAS minus 34 123 Undergraduate students

In general the g loadings were based on the correlation matrix taken from the manuals containing the testndashretest studies or from the correlation matrixbased on the largest sample size we could find What follows is a list of the sources of the g loading when not taken from the manuals containing thetestndashretest studyvan Geffen (1972) and Bosch (1973) de Wolff and Buiten (1963) see also Johnson te Nijenhuis and Bouchard (in press) Bleichrodt et al (1987) teNijenhuis et al (2004) who used the same data on which the RAKIT manual is based van der Doef Kwint and van der Koppel (1989) DutchWISC-R manual Elliott (1983) Table 98 Age 90ndash911 years US Dept of Labors GATB (1970) Jensen (1985 p 214) using the largestcorrelation matrix in the GATBmanual Wechsler (1974) Covin (1977) and Tuma and Appelbaum (1980) Jensen (1985 p 214 first study) Bennettet al (1974) average of four highly similar correlation matrices Matarazzo et al (1980) Wechslers (1955 p 17) Table 8 for ages 25ndash34McCormick et al (1983) Ree and Carretta (1994) Reeve and Lam (2005) utilize SEM analyses and use item parcels instead of full scale scores tocompute g loadings The average g loading of all the item parcels for a specific subtest was taken as the g loading of that specific subtest

290 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

risk of overcorrection we conservatively chose thevalue of 90 for the correction

9 Results

The results of the studies on the correlation between gloadings and gain scores are shown in Table 1 The tablegives data derived from 64 studies with participantsnumbering a total of 26990 The table gives thereference for the study the cognitive ability test usedthe correlation between g loadings and gain scores thesample size and background information on the study Itis clear that virtually all correlations are negative and thatthe size of the few positive correlations is very small

Table 2 shows the results of the psychometric meta-analysis of the 64 data points It shows (from left toright) the number of correlation coefficients (K) totalsample size (N) the mean observed correlations (r) andtheir standard deviation (SDr) the true correlations onecan expect once artifactual error from unreliability in theg vector and the d vector and range restriction in the gvector has been removed (ρ) and their standarddeviation (SDρ) The next two columns present thepercentage of variance explained by artifactual errors (

Table 2Meta-analysis results for correlations between g loadings and gain scores af

Studies included K N r SD

All 64 26990 minus 80 20All minus 3 outliers 61 26704 minus 81 18

K=number of correlations N=total sample size r=mean observed correlcorrelation ρ=true correlation (observed correlation corrected for unreliabiliVE=percentage of variance accounted for by artifactual errors 95 CI=9

VE) and the 95 credibility interval (95 CI) Thisinterval denotes the values one can expect for ρ in 19out of 20 cases

The large number of data points and the very largesample size indicate that we can have confidence in theoutcomes of this meta-analysis The estimated truecorrelation has a value of minus 95 and 81 of the variancein the observed correlations is explained by artifactualerrors However Hunter and Schmidt (1990) state thatextreme outliers should be left out of the analysesbecause they are most likely the result of errors in thedata They also argue that strong outliers artificiallyinflate the SD of effect sizes and thereby reduce theamount of variance that artifacts can explain We choseto leave out three outliersndashmore than 4 SD below theaverage r and more than 8 SD below ρndashcomprising1 of the research participants This resulted in nochanges in the value of the true correlation a largedecrease in the SD of ρ with 74 and a large increasein the amount of variance explained in the observedcorrelations by artifacts by 22 So when the threeoutliers are excluded artifacts explain virtually all of thevariance in the observed correlations Finally a correc-tion for deviation from perfect construct validity in g

ter corrections for reliability and restriction of range

r ρ SDρ VE 95 CI

minus 95 11 81 minus074 to 116minus 95 03 99 minus091 to 100

ation (sample size weighted) SDr=standard deviation of observedty and range restriction) SDρ=standard deviation of true correlation5 credibility interval

291J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

took place using a conservative value of 90 Thisresulted in a value of minus106 for the final estimated truecorrelation between g loadings and score gainsApplying several corrections in a meta-analysis maylead to correlations that are larger than 100 or minus100 asis the case here Percentages of variance accounted forby artifacts larger than 100 are also not uncommon inpsychometric meta-analysis They also do occur in othermethods of statistical estimation (see Hunter amp Schmidt1990 pp 411ndash414 for a discussion)

10 Discussion

A large-scale meta-analysis of 64 testndashretest studiesshows that after corrections for several artifacts there isan estimated true correlation of minus106 between gloading of tests and score gains and virtually all of thevariance in observed correlations is attributable to theseartifacts As several artifacts explain virtually all thevariance in the effect sizes other dimensions on whichthe studies differ such as age of the test takers testndashretest interval test used average-IQ samples or sampleswith learning problems play no role at all

The estimated true correlation of minus106 is the resultof various corrections for artifacts that attenuate thecorrelations The estimated values of the artifacts mayunderestimate or overestimate the population values ofthe artifacts Therefore estimates of true effect sizesmay overestimate or underestimate the populationvalues of the effect size As a solution to this problemHunter and Schmidt (2004) suggest carrying out severalmeta-analyses on the same construct and taking theaverage estimated effect size of all meta-analyses Thegeneral idea is that meta-analysis is a powerful researchtool but does not give perfect outcomes

A correlation of minus106 falls outside the range ofacceptable values of a correlation but one has to make adistinction between the meta-analytical estimate of thetrue correlation between g and d and the true correlationbetween g and d We interpret the value of minus106 for themeta-analytical estimate as meaning that the truecorrelation between g and d is minus100 A correlation ofminus100 means that there is an inverse relationshipbetween g and score gains So the tests with the highestg loadings show the smallest gains The most straightfor-ward interpretation of this very large negative correlationis that there is no g saturation in testndashretest gain scores

11 The South African learning potential study

In a carefully carried-out study Skuy et al (2002)used a dynamic testing procedure to see whether it

would improve the scores of Black South Africanstudents on Ravens Standard Progressive Matrices(RSPM) The Bantu Education Act of 1954 establisheda discriminatory educational system characterized bypoorly qualified teachers sparsely equipped and fundedschools and generally poor quality Most Black studentsin the sample had not received the same quality ofeducation as White students Black White Indian andColored research participants took the RSPM on twooccasions and in between randomly constitutedexperimental groups were exposed to the MediatedLearning Experience Both the Black South Africangroup and the group consisting of White Indian andColored South Africans improved over their baseline onthe RSPM and the Black group showed greaterimprovement

The value of these cognitive interventions increaseswhen the score gains are transferred to other tests andto external criteria such as school or work achieve-ment Therefore the research participants also tookFeuersteins Representational Stencil Design Test as atransfer measure The subject is presented with astencil of a geometric design and then asked to pointto which stencils need to be used and in whatsequence in order to construct an identical designLike the RSPM the Stencils test also requiresrepresentationalabstract thinking but the training onthe RSPM showed little transfer to it Moreover thecorrelation of the RSPM scores with performance inthe end-of-year psychology examination did notsignificantly improve after mediation Once againthe score gains were empty they did not generalizeSkuy et al go on to ask the question what it is thatwas improved by their interventions Professor Skuymade his data accessible to the present authors so wecould perform additional analyses

12 Sample

The data from Skuy et al (2002) were used with theexception of data from three research participantsbecause their pretest IQ scores were extremely low(more than 3 SDs below the group mean) Ninety-fiveuniversity students in psychology aged 16 to 29 (meanage=20 SD=23 25 males 70 females) participatedin this study They were 66 Black students (20 males 46females) and 29 White (20) Indian (6) and Colored (3)students (5 males 24 females) The mean age of theBlack group was 20 (SD=25) and of the WhiteIndian and Colored group 19 years (SD=1) Subjectswere randomly assigned to the experimental group(n=55) and to the control group (n=40)

292 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

13 Procedure

The students participated in pre- and posttest phaseswith a group intervention in between The study focusedon improvement in scores on the RSPM using the SetVariations II of the Learning Propensity AssessmentDevice as the mediation task Mediation training took3 h and was conducted by three experienced psychol-ogists with the assistance of six postgraduate psychol-ogy students A detailed description is given in Skuy etal (2002)

14 Measures and cognitive intervention

The Ravens Standard Progressive Matrices consistsof 60 items (divided into 5 sets of 12 items) designedto measure the ability to form comparisons to reasonby analogy and to organize spatial information intorelated wholes It has been established as one of thepurest measures of g (Jensen 1998a) Skuy et al(2002) found no evidence for test bias against Blacksin South African education Rushton Skuy and Bons(2004) showed that the Ravens gave comparablepredictive validities for students from various groupsCross-cultural testing research has clearly shown thatunsufficient proficiency in the language of the testcan lead to biased assessments in tests with a strongverbal component However the Ravens is a non-verbal test

The Learning Propensity Assessment Device con-sists of 14 exercises Each exercise contains an initialmediation task Subsequent tasks increase in complex-ity and novelty and aim to assist the learner toachieve mastery over the task The purpose ofmediation is to assist the learner to develop theappropriate cognitive strategies and functions neededfor the successful completion of the task The SetVariations II of the Learning Propensity AssessmentDevice consists of five sets of items which comprisevariations of Sets C D and E of the RSPM test Eachset of variations contains a learning task for thepurpose of initial mediation followed by a series ofprogressively more difficult variations to which theskills learned must be applied Mediation involvesdiscussing with groups how to define the problem tobe solved focus on the task set rules regulateproblem solving behavior and identify the correctsequence of logical steps needed to solve the taskMediation also involves helping the subject todevelop appropriate concepts verbal tools andinsights in relation to the task A detailed descriptionis given in Skuy et al (2002)

15 Statistical analyses

Although the Skuy et al study is among the SouthAfrican learning potential studies with the largestsample size the N is not large We therefore chosebasic statistical analyses

151 Descriptive statistics

Means standard deviations and reliabilities werecomputed for the various groups With regard tomeasures of effect size Hunter and Schmidt (1990 p271) advise choosing estimates of variance with the leasterror Because repeated test takings tend to change thesize of the SD (Ackerman 1987) we chose the SD ofthe pretests for the denominator The correlationbetween scores before and after the training wascomputed to see whether the training had an effect onthe rank order of individuals scores

152 Correlation between score gains and g loadedness

Because our sample was not large and quite specificestimates of g loadedness were taken from Lynn Allikand Irwings (2004) item analysis of RSPM in Estoniausing a large (N=2735) nationally representativesample The same reasoning as in psychometric meta-analysis applies namely that larger samples give betterestimates of g loadings than smaller samples In ahierarchical factor analysis of the items using structuralequations modeling Lynn et al computed g loadings of52 of the 60 items In the present study Pearsoncorrelations were calculated between the g loadings ofthese 52 items and the effect sizes on these items

153 g loadings

The RSPM consists of dichotomous items so wecomputed a correlation matrix of polychoric correlations(Nunnally amp Bernstein 1994) A principal axis factoranalysis was carried out The percentage varianceexplained by the first unrotated factor was taken as anestimate of g loadedness Because sample size waslimited we collapsed the experimental and the controlgroup

154 Correlation between sum scores and score gains

We tested whether individuals with low-g improvedtheir scores more than those with high-g by correlatinggain scores with pretest RSPM scores for each of thefour research groups As gain scores tend to be

Table 3Proportion of sample selecting the correct answer on items of Ravens Standard Progressive Matrices by group

Set A Set B Set C Set D Set E

Item Black Other a Item Black Other Item Black Other Item Black Other Item Black Other

1 100 100 13 100 100 25 100 97 37 100 100 49 74 902 97 100 14 100 100 26 96 100 38 99 100 50 64 903 97 100 15 100 100 27 96 100 39 89 100 51 79 974 100 97 16 91 97 28 86 93 40 92 100 52 56 835 100 100 17 96 97 29 94 97 41 96 100 53 52 836 99 100 18 85 100 30 76 83 42 92 100 54 35 767 94 97 19 77 66 31 88 97 43 77 100 55 42 798 91 93 20 79 97 32 50 79 44 76 93 56 21 699 100 97 21 83 97 33 74 90 45 71 97 57 30 4110 91 97 22 92 100 34 61 79 46 79 93 58 12 4111 83 90 23 80 90 35 53 69 47 29 41 59 02 1712 68 83 24 59 83 36 06 35 48 26 38 60 11 21a Other=White Indian and Colored

293J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

negatively correlated with pretest scores as a function ofunreliability (see Cronbach 1990 Nunnally amp Bern-stein 1994) we corrected the correlations using TuckerDamarin and Messicks (1966) formula 63 Using theformula one adds to each correlation the term (SDpretestSD gain score) (1minus reliability pretest)

16 Results

161 Descriptive statistics

Internal consistencies (Cronbach αs) on the RSPMranged from 76 to 86 for the pre- and posttestsrespectively Table 3 shows the proportion of each of thegroups which selected the correct answer on each of the60 items of the pretest Across the 60 items the order ofthe p values was almost identical for Blacks and WhiteIndianColoreds (r=92 p=00)

Table 4 shows the means and standard deviations forthe total RSPM scores for the four groups along withthe d effect sizes representing the difference betweenpre- and posttest scores (Cohen 1988) First we

Table 4Pre- and posttest mean ravens scores standard deviations and mean effect

Black experimental(n=40)

Black control (n=26)

Pretest Posttest Pretest Postte

Raw scoresM 4378 5010 4546 4835SD 664 531 669 671Percentile 14 41 16 31Effect size 095 043

Percentiles are based on US adult norms see Raven Raven and Courts (2a Other=White Indian and Colored

examined whether there was an effect of race (Blackvs WhiteIndianColored) and group (experimental vscontrol) on the pretest scores There was a significanteffect due to race (F(1 91)=2413 p=00 η2 = 21)but not group (F(1 91)=228 p= 14 η2 = 02) Thismeans that mean pretest scores of Blacks (M=4444 SD=665) were lower than those of WhiteIndianColoreds (M=5141 SD=505) and that mean pretestscores of experimental and control groups werecomparable (M=4553 SD=704 and M=48 SD=67 respectively)

Secondly we investigated the effects of training onthe posttest scores by performing a two-way ANCOVAon the total posttest scores with race and group as factorsand the total pretest scores as the covariate There was asignificant effect for group (F(1 95)=1381 p=00η2 = 13) and for race (F(1 90)=399 p=05 η2 = 04)but not for the two-way interaction of group and race (F(1 90)=028 p= 60 η2 = 00) These results indicatethat the training was equally effective for both the Blackand WhiteIndianColored students Posttest scores ofBlacks (M=4941 SD=591) however remained

sizes for Black and WhiteIndianColored students

Other a experimental(n=15)

Other control (n=14)

st Pretest Posttest Pretest Posttest

5020 5580 5271 5536605 376 345 34341 75 55 68093 077

000) Table SPM13

294 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

significantly lower (F(1 91)=2833 p=00) than thoseof WhitesIndiansColoreds (M=5559 SD=355)Although posttest scores of the experimental group(M=5165 SD=553) were higher than those of thecontrol group (M=508 SD = 665) differencesbetween both groups were nonsignificant (F(1 91)=085 p=36)

The correlation between scores before and after thetraining was 84 (p=00) for the experimental group and90 (p=00) for the control group showing that thetraining had only limited effect on the rank order ofindividuals scores This means that the test strongly butnot perfectly measures the same constructs on bothoccasions

162 Correlation between score gains and g loadedness

We estimated effect sizes for each of the four groups(race by condition) by computing the differencebetween mean pretest scores and posttest scores dividedby the standard deviation of the pretest scores of Blackand WhiteIndianColored students respectivelyFinally we calculated the correlations between effectsizes and the g loadings taken from Lynn et alCorrelations were minus 24 (p=10) for the Black experi-mental group minus 21 (p=20) for the WhiteIndianColored experimental group minus 08 (p=59) for theBlack control group and minus 41 (p=01) for the WhiteIndianColored control group Small sample sizesusually attenuate correlations (Hunter amp Schmidt1990) Collapsing the groups indeed resulted in higheraverage correlations minus 39 for the complete experimen-tal group and minus 26 for the complete control group

163 g loadings

Using the combined experimental and controlgroup a principle axis factor analysis on the pretestand posttest scores respectively resulted in a firstunrotated factor explaining 22 of the variance in thepretest scores and 18 of the variance in the posttestscores These findings suggest that the g loadedness ofthe RSPM decreased substantially after MediatedLearning Experience

164 Correlation between score gains and sum score

Correlating score gains with RSPM total scoresresulted in values of minus 60 (p=00) for the Blackexperimental group minus 18 (p=38) for the Black controlgroup minus 82 (p= 00) for the WhiteIndianColoredexperimental group and minus 48 (p=08) for the White

IndianColored control group After the use of thecorrection formula of Tucker et al (1966) thesecorrelations became minus 39 minus 08 minus 61 and minus 35respectively Overall these correlations show that low-g persons improved their scores more strongly thanhigh-g persons

17 Discussion

Skuy et al (2002) hypothesized that the low-qualityeducation of Blacks in South Africa would lead to anunderestimate of their cognitive abilities by IQ testsGroups of Black and WhiteIndianColored studentstook the Ravens Progressive Matrices twice and inbetween received Feuersteins Mediated LearningExperience The test scores went up substantially in allgroups Evidence for an authentic change in the g factorrequires broad transfer or generalizability across a widevariety of cognitive performance However Skuy et alshow that the gains did not generalize to scores on another highly similar test and to external criteria andwere therefore hollow As the score gains were in somecases quite largendash14 IQ points for the Black experi-mental groupndashthe question becomes what is it thatimproved

The findings show that the correlations betweenscore gains and g loadedness of the items were minus 39 forthe complete experimental group and minus 26 for thecomplete control group However because the gloadings and gain scores are measured at the itemlevel their reliabilities are not high resulting insubstantial attenuation of the correlation between gand d Moreover RSPM does not measure g perfectlyJensen (1998a p 91) estimates its g loading at 83When we estimate the reliability of the g vector at 70and the reliability of the gain score vector at 50corrections for unreliability and deviation from perfectconstruct validity of g only would result in estimatedtrue correlations of respectively minus 80 and minus 53 Thesevalues should be taken as underestimates controllingfor additional artifacts will bring them closer to the verystrong negative correlation found in the meta-analysis

The findings suggest that after training the gloadedness of the test decreased substantially Wefound negative substantial correlations between gainscores and RSPM total scores Table 4 shows that thetotal score variance decreased after training which is inline with low-g subjects increasing more than high-gsubjects Since as a rule high-g individuals profit themost from trainingndashas is reflected in the ubiquitouspositive correlation between IQ scores and trainingperformance (Jensen 1980 Schmidt amp Hunter 1998)ndash

295J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

these findings could be interpreted as an indication thatFeuersteins Mediated Learning Experience is not g-loaded in contrast with regular trainings that are clearlyg-loaded Substantial negative correlations betweengain scores and RSPM total scores are no definite proofof this hypothesis but are in line with it Additionalsubstantiation of our hypothesis that the Feuersteintraining has no or little g loadedness is that Coyle (2006)showed that gain scores loaded virtually zero on the gfactor Moreover Skuy et al reported that the predictivevalidity of their measure did not increase when thesecond Raven score was used The fact that individualswith low-g gained more than those with high-g could beinterpreted as an indication that the Mediated LearningExperience was not g-loaded It should be notedhowever that Feuerstein most likely did not intend hisintervention to be g-loaded He was interested inincreasing the performance of low scorers on bothtests and external criteria

18 General discussion

IQ scores are by far the best general predictor ofsuccess in education job training and work Howeverthere are many ways in which these IQ scores can beincreased for instance by means of retesting orparticipating in a learning potential training programWhat conclusions can be drawn from such score gainsJensens (1998a) hypothesis that the effects of trainingon abilities can be summarized in terms of Carrollsthree-stratum hierarchical factor model was tested in ameta-analysis on testndashretest data using Dutch Britishand American test batteries and with learning potentialdata from South Africa using Ravens ProgressiveMatrices The meta-analysis convincingly shows thattestndashretest score gains are not g-loaded The findingsfrom the learning potential study are clearly in line withthis when the attenuation caused by unreliability andother artifacts is taken into account the correlationbetween g loadings of items and gains on items has avalue that is somewhat comparable to the one found inthe meta-analysis for test batteries The data suggest thatthe g loadedness of item scores decreases after theintervention training Te Nijenhuis et als (2001)finding that practice and coaching reduced the g-loadedness of their test scores strengthens the presentfindings using item scores The findings show that notthe high-g participants increase their scores the mostndashasis common in training situationsndashbut it is the low-gpersons showing the largest increases of their scoresThis suggests that the intervention training is not g-loaded

Our findings fit quite well with the hierarchical modelof intelligence The generalizability of test scores residespredominantly in the g component whereas the test-specific ability component and the narrow abilitycomponent are virtually non-generalizable This is forinstance evidenced by the earlier finding that addingverbal tests to a g score or numerical tests to a g scoreresulted in only a very small incremental validity (Ree ampEarles 1991 Ree et al 1994) Additionally Ericssonand Lehmann (1996) reported immense gains for amemory task focusing on one narrow ability but did notfind any improvement for comparable memory tasksfocusing on another narrow ability As the score gains arenot related to g the generalizable g componentdecreases and since it is not unlikely that the Feuersteintraining itself is not g-loaded it is easy to understand whythe score gains did not generalize to scores on thecognitively loaded Representational Stencil Design TestFor a similar reason the score gains did not generalize tog-loaded external criteria as the correlation of the RSPMscores with performance in the end-of-year psychologyexamination did not significantly improve after media-tion Reeve and Lam (2005) claimed that retesting doesnot change the nature of what is being tested but ourfindings suggest the opposite

19 Limitations of the studies

Our meta-analysis and our analysis of the SouthAfrican study are strongly based on the method ofcorrelated vectors (MCV) and recently it has been shownto have limitations Dolan and Lubke (2001) have shownthat when comparing groups substantial positive vectorcorrelations can still be obtained even when groups differnot only on g but also on factors uncorrelated with gAshton and Lee (2005) show that associations of avariable with non-g sources of variance can produce avector correlation of zero even when the variable isstrongly associated with g They suggest that the gloadings of a subtest are sensitive to the nature of the othersubtest in a battery so that a specific sample of subtestsmay cause a spurious correlation between the vectorsNotwithstanding these limitations studies using MCVcontinue to appear (see for instance Colom Haier ampJung in pressHartmannKruuseampNyborg in press Leeet al 2006) The outcomes of our meta-analysis of a largenumber of studies using the method of correlated vectorsmay make an interesting contribution to the discussion onthe limitations of the method of correlated vectors

A principle of meta-analysis is that the amount ofinformation contained in one individual study is quitemodest Therefore one should carry out an analysis of

296 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

all studies on one topic and correct for artifacts leadingto a strong increase of the amount of information Thefact that our meta-analytical value of r=minus106 isvirtually identical to the theoretically expected correla-tion between g and d of minus100 holds some promise thata psychometric meta-analysis of studies using MCV is apowerful way of reducing some of the limitations ofMCV An alternative methodological approach is tolimit oneself to the rare datasets enabling the use ofstructural equations modeling However from a meta-analytical point of view these studies yield only a quitemodest amount of information

Additional meta-analyses of studies employing MCVare necessary to establish the validity of the combinationof MCV and psychometric meta-analysis Most likelymany would agree that a high positive meta-analyticalcorrelation between measures of g and measures ofanother construct implies that g plays a major role andthat a meta-analytical correlation of minus100 implies that gplays no role However it is not clear what value of themeta-analytical correlation to expect from MCV when gplays only a modest role After the present meta-analysison a construct that clearly has an inverse relationshipwith g it would be informative to carry out meta-analyses of studies on variables that are strongly linkedto g and variables that are modestly linked to g Anexample of the latter would be secular score gainswhich according to Lynns (1990) nutrition theoryshould be modestly g-loaded

The sample sizes in the South African study are notlarge but still larger than those in many other studies oflearning potential where an Nasymp10 is not unusual Theresults of a reanalysis of the many existing studies ondynamic testing could lead to a meta-analysis with alarge combined N The mean posttest score was quitehigh so a ceiling effect may have taken place for theWhiteIndianColored group leading to an underestima-tion of the experimental score gain for this group

Instead of testing the hypothesis with a stronglyunidimensional test such as the RSPM it would be betterto use a multidimensional test Moreover a large samplesize would allow the use of more rigorous data-analyticaltechniques leading to more definitive results Howeverto the best of our knowledge datasets meeting theserequirements do not exist and the Skuy et al study isarguably the best South African learning potential study

20 Score gains as low-quality measuresof motivation

As criterion-related validity is strongly dependent ong te Nijenhuis et als finding of lowered g loadings

after training should result in lowered criterion-relatedvalidity However the empirical findings show theopposite virtually all testndashretest and test preparationstudies on cognitive tests and scholastic aptitude tests thatreported both criterion-related validities demonstratesmall to modest increases in criterion-related validity forthe second or third test score (see Allalouf amp Ben-Shakhar 1998 Bashi 1976 Coyle 2006 HausknechtTrevor amp Farr 2002 Jones 1986 Linn 1977 Olsen ampSchrader 1959 Ortar 1960 Powers 1985 Reeve ampLam 2005) In the carefully designed study by Allaloufand Ben-Shakhar (1998) of a university entrance test theexperimental group received an intensive 40-h testcoaching program while the control group did not Thecriterion-related validity for the retest increased for bothgroups Most importantly the increase was the samemdashitwas not larger for the experimental group

In a little-known but carefully designed large-scalelearning potential study by Resing (1990 see Table423) she compared an experimental group thatreceived a pretest a learning potential training and aposttest against a control group that received only thepretest and the posttest The mean criterion-relatedvalidity of the various second scores was 62 for both theexperimental and the control group Learning potentialtraining did not result in incremental criterion-relatedvalidity over and above the validity resulting fromsimply retesting The findings from both Resing andAllalouf and Ben-Shakhar suggest that cognitiveinterventions do not increase criterion-related validitymore than simple retesting

g and the personality measure conscientiousness havebeen shown to make an excellent combination ofpredictors (Schmidt amp Hunter 1998) Conscientiousnessrepresents among other characteristics persistence a willto achieve and the ability to focus effort on the goal Afield study on test preparation using actual job applicants(Clause Delbridge Schmitt Chan amp Jennings 2001)showed that motivation to perform well on the testcorrelated 25 with test performance One could speculatethat score increases do not reflect a true cognitivecomponent but rather become low-quality measures ofmotivation Further since the increase in validity due toretesting and learning potential training is modest incomparison to the large increase obtainable from the useof personality questionnaires personality testing mightprovide a less expensive and more accurate alternative

21 Effectiveness of various training formats

Components of the mediation training used by Skuyet al (2002) are similar to the test training used in te

297J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Nijenhuis et al (2001) Both the Dutch training and theSouth African training took 3 h but whereas in theDutch training the focus was on two different testformats the South African training dealt only with onetest format The test training by Lloyd and Pidgeon(1961) took even less time namely two half-hoursegments each focusing on one test format The effectsizes in all studies were roughly comparable Thissuggests that the methodologies employed by teNijenhuis et al and Lloyd and Pidgeon were moreefficient than those used by Skuy et al It is possible thatthe components of the mediation training that are notpresent in the other two training formats are not effectivein raising test scores and could therefore be left out Iftrue it might be possible to increase the scores on theRSPM by one SD with a relatively simple 1-h training

22 Generalizability of findings

Can these findings of hollow score gains after testndashretest test practice and Mediated Learning ExperienceTraining be generalized to other studies where training-induced score gains were found Ericsson and Lehmann(1996) reported tremendous score increases afterintensive training on numeric memory tests but thesegains did not generalize in the least to verbal memorytests Such gains on one narrow ability do not generalizeto another narrow ability clustering under the samebroad ability and are therefore hollow Similarly Jensen(1998b) showed that score gains due to adoption werenot on the g factor and were therefore most likelyhollow

Rushton (1999) argued that intergenerational scoregains are not linked to g suggesting the Flynn effectsmay be empty but he was strongly criticized by Flynn(1999 2000) In studies on the Flynn effect score gainsfound in cross-sectional studies are largest on the RSPM(Flynn 1987) It has been suggested by Lynn (1998) thata substantial part of these intergenerational score gainson the RSPM are generalizablendashthey do reflect highergndashbut the remaining part is hollow and should beinterpreted as schooling effects The RSPM does requirethe application of the mathematical principles ofaddition subtraction progression and the distributionof values In the three decades (1950sndash1980s) overwhich these increases in RSPM scores have occurredincreasing proportions of 15- to 18-year-olds haveremained in schools where they have learned mathskills that they have applied to the solution of matricesproblems Our findings could be interpreted as supportfor Lynns hypothesis of the partial hollowness of scoregains on the RSPM Notwithstanding the high g loading

of the sum score of the RSPM it is quite sensitive totestndashretest effects and training effects Some studies onthe Flynn effect (Lynn amp Hampson 1986 Teasdale ampOwen 1989) show that the increase in scores is largelyconcentrated in the lower segments of the IQ distribu-tion Our finding that low scorers show the largest gainsafter training may additionally support the notion that apart of the Flynn effect on the RSPM is hollow FinallyWicherts et als (2004) findings show that in some oftheir datasets the secular score gains are most stronglylinked to broad- narrow- and test-specific abilitiesshowing that an important part of the gains are non-generalizable

Ceci (1991) showed that increased schooling leads tohigher IQ scores but are these gains highly specific orpredominantly generalizable It would be interesting toapply the techniques we used in this study to thefindings from previous intervention studies It may bethat biological interventions (such as diet vitaminsupplements vaccination against infectious disease)rather than psychological or educational interventionsare the most cost-effective method of producing truechanges in g and broad abilities It may be that there is abiological barrier between the first stratum and thesecond stratum that restricts the effects of behavioralinterventions to narrow abilities and test specificities

Acknowledgement

We like to thank Mervyn Skuy for his permission touse his dataset

Thanks to Marieacute de Beer Raegan Murphy WelkoTomic Art Jensen and Frank Schmidt for feedback onprevious versions of this paper

Thanks to Arne Evers Wilma Resing (Dutch TestCommittee) and Andress Kooij (Harcourt) for alsohelping in locating testndashretest studies

References

Ackerman P L (1986) Individual differences in informationprocessing An investigation of intellectual abilities Intelligence10 101minus139

Ackerman P L (1987) Individual differences in skill learning Anintegration of psychometric and information processing skillsPsychological Bulletin 102 3minus27

Allalouf A amp Ben-Shakhar G (1998) The effect of coaching on thepredictive validity of scholastic aptitude tests Journal ofEducational Measurement 35(1) 31minus47

Ashton M C amp Lee K (2005) Problems with the method ofcorrelated vectors Intelligence 33 431minus444

Bashi Y (1976) Verbal and non-verbal abilities of 4th 6th and 8thgrade students in the Arab educational system in Israel JerusalemHebrew University School of Education

298 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Bleichrodt N Resing W C M Drenth P J D amp Zaal J N (1987)Intelligentie-meting bij kinderen Empirische en methodologischeverantwoording van de geReviseerde Amsterdamse Kinder Intelli-gentie Test [Measuring the intelligence of children Empirical andmethodological justification of the Revised Amsterdam ChildrenIntelligence Test] Lisse the Netherlands Swets

Bennett G K Seashore H G ampWesman A G (1974)DifferentialAptitude Tests (5th ed) Manual New York The PsychologicalCorporation

Boeyens J C A (1989) Learning potential An empiricalinvestigation Pretoria South Africa Human Science ResearchCouncil

Bosch F (1973) Inventarisatie beschrijving en onderzoek mbt dewijzigingen van de GATB incl test-hertest onderzoek (NoPz3bRp0120) [Stock-taking description and research concern-ing the modifications of the GATB includes testndashretest study]Utrecht the Netherlands Nederlandse Spoorwegen

Carroll J B (1993) Human cognitive abilities A survey of factoranalysis studies Cambridge University Press

Ceci S J (1991) How much does schooling influence generalintelligence and its cognitive components A reassessment of theevidence Developmental Psychology 27 703minus722

Christian K Bachnan H J amp Morrison F J (2001) Schooling andcognitive development In R J Sternberg amp E L Grigorenko(Eds) Environmental effects on cognitive abilities (pp 287minus335)Mahwah NJ Erlbaum

Clause C S Delbridge K Schmitt N Chan D amp Jennings D(2001) Test preparation activities and employment test perfor-mance Human Performance 14 149minus167

Cohen J (1988) Statistical power analysis for the behavioralsciences Hillsdale Lawrence Erlbaum

Colom R Jung R E amp Haier R J (in press) Finding the g-factor inbrain structure using the method of correlated vectors Intelligence

Covin T A (1977) Stability of the WISC-R for 9-year-olds withlearning difficulties Psychological Reports 40 1297minus1298

Coyle T R (2006) Testndashretest changes on scholastic aptitude tests arenot related to g Intelligence 34 15minus27

Cronbach L J (1990) Essentials of psychological testing New YorkHarperCollins

de Villiers AB (1999) Disadvantaged students academic perfor-mance Analysing the zone proximal developmentUnpublished DPhil thesis University of Cape Town South Africa

de Wolff C J amp Buiten B (1963) Een factoranalyse van viertestbatterijen [A factor analysis of four test batteries] NederlandsTijdschrift Voor Psychologie 18 220minus239

Dolan C V amp Lubke G (2001) Viewing Spearmans hypothesisfrom the perspective of multigroup PCA A comment onSchonemanns criticism Intelligence 29 231minus245

Drenth P J D Petrie J F amp Bleichrodt N (1968) Handleiding bijde Amsterdamse Kinder Intelligentie Test [Manual of theAmsterdam Children Intelligence Test] Amsterdam VrijeUniversiteit

Elliott C D (1983) British Ability Scales Manual 2 TechnicalHandbook Windsor Great-Britain NFER-Nelson

Engelbrecht M (1999) Leerpotensiaal as voorspeller van akademi-ese sukses van universiteitsstudente [Learning potential aspredictor of the academic success of university students]Unpublished D Phil thesis Potchefstroom University forChristian Higher Education South Africa

Ericsson K A amp Lehmann A C (1996) Expert and exceptionalperformance Evidence of maximal adaptation to task constraintsAnnual Review of Psychology 47 273minus305

Evers A amp Lucassen W (1991) Handleiding DAT 83 DifferentieumlleAanleg Testserie [Manual DAT83 Differential Aptitude Testseries] Amsterdam Swets

Fleishman E A amp Hempel W E (1955) The relation betweenabilities and improvement with practice in a visual discriminationreaction task Journal of Experimental Psychology 49 301minus312

Flynn J R (1987) Massive IQ gains in 14 nations What IQ testsreally measure Psychological Bulletin 101 171minus191

Flynn J R (1999) Evidence against Rushton The genetic loading ofWISC-R subtests and the causes of between-group IQ differencesPersonality and Individual Differences 26 373minus379

Flynn J R (2000) IQ gains WISC subtests and fluid g g theory andthe relevance of Spearmans hypothesis to race In G R B JGoode (Ed) The nature of intelligence (pp 202minus227) New YorkWiley

Gaydon VP (1988) Predictors of performance of disadvantagedadolescents on the SowetoAlexandra gifted child programmeUnpublished M Ed dissertation University of the WitwatersrandSouth Africa

Gottfredson L S (1997) Why g matters The complexity of everydaylife Intelligence 24(1) 79minus132

Gottfredson L S (2002) g Highly general and highly practical In RJ Sternberg amp E L Grigorenko (Eds) The general intelligencefactor How general is it (pp 331minus380) Mahwah NJ Erlbaum

Grigorenko E L amp Sternberg R J (1998) Dynamic testing Psy-chological Bulletin 124 75minus111

HaeckW Yeld N Conradie J Robertson N amp Shall A (1997) Adevelopmental approach to mathematics testing for universityadmissions and course placement Educational Studies in Mathe-matics 33 71minus91

Hartmann P Kruuse NHS amp Nyborg H (in press) Testing thecross-racial generality of Spearmans hypothesis in two samplesIntelligence

Hausknecht J P Trevor C O amp Farr J L (2002) Retaking abilitytests in a selection setting Implications for practice effects trainingperformance and turnover Journal of Applied Psychology 87(2)243minus254

Hunter J E amp Schmidt F L (1990) Methods of meta-analysisLondon Sage

Hunter J E amp Schmidt F L (2004) Methods of meta-analysis (2nded) London Sage

Jensen A R (1980) Bias in mental testing London MethuenJensen A R (1985) The nature of the blackndashwhite difference on

various psychometric tests Spearmans hypothesis Behavioraland Brain Sciences 8 193minus263

Jensen A R (1998a) The g factor The science of mental abilityLondon Praeger

Jensen A R (1998b) Adoption data and two g-related hypothesesIntelligence 25 1minus6

Johnson W Bouchard T J Krueger R F Jr McGue M ampGottesman I I (2004) Just one g Consistent results from threetest batteries Intelligence 32 95minus107

Johnson W te Nijenhuis J amp Bouchard TJ Jr (in press)Replication of the hierarchical visual-perceptual-image rotationmodel in de Wolff and Buitens (1963) battery of 46 tests of mentalability Intelligence

Jones R J (1986) A comparison of the predictive validity of theMCAT for coached and uncoached students Journal of MedicalEducation 61 335minus338

Kaufman A S amp Kaufman N L (1983) K-ABC KaufmanAssessment Battery for Children Interpretive manual CirclePines MN AGS

299J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Kooij A P Rolfhus E Wilkins C Yang Z amp Zhu J (2005)WAIS-III Nederlandstalige bewerking Technisch rapport hernor-mering [WAIS-III adoptation in Dutch Technical report renorm-ing] Amsterdam Harcourt

Kort W Schittekatte M Dekker P H Verhaeghe P Compaan EL Bosmans M amp Vermeir G (2005) WISC-IIINL WechslerIntelligence Scale for Children Derde Editie NL Handleiding enverantwoording [The Dutch WISC-III Wechsler Intelligence Scalefor Children Third Edition for the Netherlands Manual andjustification] Amsterdam NIP

Kulik J A Bangert-Drowns R L amp Kulik C C (1984)Effectiveness of coaching for aptitude tests PsychologicalBulletin 95 179minus188

Kulik J A Kulik C C amp Bangert R L (1984) Effects of practiceon aptitude and achievement test scores American EducationalResearch Journal 21 435minus447

Lee K H Choi Y Y Gray J R Cho S H Chae J -H Lee S etal (2006) Neural correlates of superior intelligence Strongerrecruitment of posterior parietal cortex NeuroImage 29(2)578minus586

Linn R L (1977) On the treatment of multiple scores for LawSchool Admission Test repeaters (Report LSAC-77-4) In LawSchool Admission Council Reports of LSAC Sponsored ResearchVolume III 1975-1977 Princeton NJ Law School AdmissionCouncil

Lipson LE (1992) Relationship of static and dynamic measures toscholastic achievement of black pupils Unpublished MEddissertation University of Witwatersrand South Africa

Lloyd F amp Pidgeon D A (1961) An investigation into the effects ofcoaching on non-verbal test material with European Indian andAfrican children British Journal of Educational Psychology 31145minus151

Luteijn F amp Barelds D P H (2005) GIT2 Groninger IntelligentieTest 2 [GIT2 Groningen Intelligence Test 2] Amsterdam Harcourt

Lynn R (1990) The role of nutrition in secular increases inintelligence Personality and Individual Differences 11 273minus285

Lynn R (1998) In support of the nutrition theory In U Neisser(Ed) The rising curve Long-term gains in IQ and relatedmeasures (pp 207minus215) Washington DC American Psycholo-gical Association

Lynn R Allik J amp Irwing P (2004) Sex differences on three factorsidentified in Ravens Standard Progressive Matrices Intelligence32 411minus424

Lynn R amp Hampson S (1986) The rise of national intelligenceEvidence from Britain Japan and the USA Personality andIndividual Differences 7 23minus32

Matarazzo J D Carmody T P amp Jacobs L D (1980) Testndashretestreliability and stability of the WAIS A literature review withimplications for clinical practice Journal of Clinical Neuropsy-chology 2(2) 89minus105

McCormick BK Dunlap WP Kennedy RS amp Jones MB(1983) The effects of practice on the Armed Forces VocationalAptitude Test Battery US Army Research Institute for theBehavioral and Social Sciences Technical Report 602

Mulder J L Dekker R amp Dekker P H (2004) KaufmanIntelligentietest voor adolesecenten en volwassenen (KAIT)Handleiding [Kaufman Intelligence test for adolescents and adults(KAIT) Manual] Leiden the Netherlands PITS

Murphy R (2002) A review of South African research in the fieldof dynamic assessment Unpublished MA dissertation Universityof Pretoria (available online from httpupetdupaczathesisavailableetd-05042002-161239)

Nel A (1997)Die voorspelling van akademiese sukses binne kontekstvan n alternatiewe universiteitstoelatingsbeleid [The predictionof academic success within the context of an alternative policy ofuniversity admission] Unpublished MA dissertation RandAfrikaans University South Africa

Neubauer A C amp Freudenthaler H H (1994) Reaction time in asentence-picture verification test and intelligence Individualstrategies and effects of extended practice Intelligence 19193minus218

Nunnally J C amp Bernstein I H (1994) Psychometric theory(3rd ed) New York McGraw-Hill

Olsen M amp Schrader W B (1959) The use of preliminary and finalScholastic Aptitude Test scores in predicting college grades(College Entrance Examination Board Research and DevelopmentReports and Statistical Reports 59-19 Princeton NJ Educa-tional Testing Service

Ortar G R (1960) Improving test validity by coaching EducationalResearch 2 137minus142

Powers D E (1985) Effects of test preparation on the validity ofGraduate Admission Test Applied Psychological Measurement 9179minus190

Raven J Raven J C amp Court J H (2000) Standard ProgressiveMatrices Raven manual Section 3 Oxford Psychologists Press

Ree M J amp Carretta T R (1994) The correlation of generalcognitive ability and psychomotor tracking tests InternationalJournal of Selection and Assessment 2 209minus216

Ree M J amp Earles A A (1991) Predicting training success Notmuch more than g Personnel Psychology 44 321minus332

Ree M J Earles J A amp Teachout M S (1994) Predicting jobperformance Not much more than g Journal of AppliedPsychology 79 518minus524

Reeve C L amp Lam H (2005) The psychometric paradox of practiceeffects due to retesting Measurement invariance and stable abilityestimates in the face of observed score changes Intelligence 33535minus549

Resing W C M (1990) Intelligentie en leerpotentieel Eenonderzoek naar het leerpotentieel van jonge leerlingen uit hetbasis-en speciaal onderwijs [Intelligence and learning potential Astudy into the learning potential of young students in basic andspecial education] Amsterdam the Netherlands Swets

Rushton J P (1999) Secular gains in IQ are not related to the g factorand inbreeding depressionmdashunlike blackndashwhite differences A replyto Flynn Personality and Individual Differences 26 381minus389

Rushton J P Skuy M amp Bons T A (2004) Construct validity ofRavens Advanced Progressive Matrices for African and non-African engineering students in South Africa InternationalJournal of Selection and Assessment 12(3) 220minus229

Schmidt F L amp Hunter J E (1998) The validity and utility ofselection methods in personnel psychology Practical and theore-tical implications of 85 years of research findings PsychologicalBulletin 124(2) 262minus274

Schmidt F L amp Hunter J E (1999) Theory testing andmeasurement error Intelligence 27(3) 183minus198

Schmidt F L amp Le H (2004) Software for the Hunter-Schmidtmeta-analysis methods University of Iowa Department ofManagement and Organization IOWA City IQ 42242

Schroots J J F amp van Alphen de Veer R J (1979) LDT LeidseDiagnostische Test Deel 1 Handleiding [LDT Leiden DiagnosticTest Part 1 Manual] Lisse the Netherlands Swets

Shochet I M (1986) Manifest and potential performance inadvantaged and disadvantaged students Unpublished DPhildissertation University of the Witwatersrand South Africa

300 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Skuy M Gewer A Osrin Y Khunou D Fridjon P amp RushtonJ P (2002) Effects of mediated learning experience on RavensMatrices scores of African and non-African university studentsin South Africa Intelligence 30 221minus232

Swanson H E amp Lussier C M (2001) A selective synthesis of theexperimental literature on dynamic assessment Review of Educa-tional Research 71 321minus363

Teasdale T W amp Owen D R (1989) Continuing secular increase inintelligence and a stable prevalence of high intelligence levelsIntelligence 13 255minus262

Tuma J M amp Appelbaum A S (1980) Reliability and practiceeffects of WISC-R IQ estimates in a normal population Educa-tional and Psychological Measurement 40 671minus678

te Nijenhuis J Tolboom E Resing W amp Bleichrodt N (2004)Does cultural background influence the intellectual performance ofchildren from immigrant groups Validity of the RAKITintelligence test for immigrant children European Journal ofPsychological Assessment 20 10minus26

te Nijenhuis J amp van der Flier H (1997) Comparability of GATBscores for immigrants and majority group members Some Dutchfindings Journal of Applied Psychology 82 675minus687

te Nijenhuis J Voskuijl O F amp Schijve N B (2001) Practice andcoaching on IQ tests Quite a lot of g International Journal ofSelection and Assessment 9 302minus308

Thorndike R L (1985) The central role of general ability inprediction Multivariate Behavioral Research 20 241minus254

Tucker L R Damarin F amp Messick S (1966) A base-free measureof change Psychometrika 31(4) 457minus473

van der Doef M P Kwint J M amp van der Koppel (1989) Wat lerenmoeilijk lerende kinderen van de WISC-R [What do children whohave difficulties in learning learn from the WISC-R] Kind enAdolescent 10 136minus141

United States Department of Labor (1970) Manual for the USTESGeneral Aptitude Test Battery Section III DevelopmentWashing-ton DC United States Department of Labor

van Geffen (1972) De betrouwbaarheid van de GATB 1002-B opbrugklasniveau [The reliability of the GATB 1002 B for the firstclass at secondary school] Catholic University Nijmegen theNetherlands Psychology of Work and Organisation

van Haasen P P de Bruyn E E J Pijl Y J Poortinga Y H LutjeSpelberg H C Vander Steene G et al (1986) WISC-RWechsler Intelligence Scale for Children-Revised Nederlandsta-lige uitgave [WISC-R Wechsler Intelligence Scale for Children-Revised Dutch edition] Lisse the Netherlands Swets

Wechsler D (1955) Manual for the Wechsler Adult IntelligenceScale New York The Psychological Corporation

Wechsler D (1967)Manual for the Wechsler Preschool and PrimaryScale of Intelligence New York The Psychological Corporation

Wechsler D (1974) Manual for the Wechsler Intelligence Scale forChildren-Revised New York The Psychological Corporation

Wechsler D (1981) WAIS-R manual Wechsler Adult IntelligenceScale-Revised New York The Psychological Corporation

Wechsler D (1997)WAIS-III Wechsler Adult Intelligence Scale-thirdedition and WMS-III Wechsler Memory Scale-third editionTechnical manual New York The Psychological Corporation

Wicherts J W Dolan C V Oosterveld P van Baal G C VBoomsma D I amp Span M M (2004) Are intelligence testsmeasurement invariant over time Investigating the nature of theFlynn effect Intelligence 32(5) 509minus537

Yeld N amp Haeck W (1997) Educational histories and academicpotential Can tests deliver Assessment and Evaluation in HigherEducation 22 5minus16

Zaaiman H (1998) Selecting students for Mathematics and ScienceThe challenge facing higher education in South Africa SouthAfrica Pretoria HSRC Publishers

Zaaiman H van der Flier H amp Thijs G D (2001) Dynamic testingin selection for an educational programme Assessing SouthAfrican performance on the Raven Progressive Matrices Inter-national Journal of Selection and Assessment 9 258minus269

Zolezzi S A (1992) Alternative selection measures for universityundergraduate admissions Unpublished MEd dissertation Uni-versity of the Witwatersrand South Africa

Zolezzi S A (1995) The effectiveness of dynamic assessment as analternative aptitude testing strategy Unpublished DPhil disserta-tion University of South Africa South Africa

286 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

where the cognitive predictor with the highest predictivevalidity was not used but where the dynamic tests werepitted against predictors with substantially lower pre-dictive validities than g As no direct comparisons weremade between learning potential tests and g it is notpossible to draw the conclusion that g had higherpredictive validity However since a comparison of alearning potential test with one test or a combination of alimited number of tests generally results in comparablepredictive validities and g scores clearly have higherpredictive validities than one test or a combination of alimited number of tests it not unlikely that a g score willhave a higher predictive validity than a learningpotential test score This also suggests that the findingsmight best be interpreted as tentative support forJensens theory

So the studies on learning potential appear to supportthe theory that score gains can be summarized in thehierarchical intelligence model However more directtests of the theory are required and therefore a learningpotential study was reanalyzed

6 Research questions

The research question of this study is whether scoregains from testndashretest studies and mediated interven-tions can be summarized in terms of Carrolls three-stratum hierarchical intelligence model We examinedwhether (1) correlations between score gains and the gloadedness of the scores are negative in sign (2) the gloadedness of scores decreases after mediation and (3)low-g persons show the largest gains after the mediationtraining We carried out a meta-analysis to be able toprovide a convincing answer to the first researchquestion In a more explorative study on learningpotential in South Africa we tried to find support for allthree research questions

7 Testndashretest studies

To test whether there is a negative correlationbetween g loading of tests and score gains we carriedout a meta-analysis of all testndashretest studies of DutchBritish and American test batteries available in theNetherlands All studies were simple practice studiesndashno intervention such as additional coaching took placendashand used well-validated tests

8 Method

Psychometric meta-analysis (Hunter amp Schmidt1990) aims to estimate what the results of studies

would have been if all studies had been conductedwithout methodological limitations or flaws The resultsof perfectly conducted studies would allow a lessobstructed view of the underlying construct-levelrelationships (Schmidt amp Hunter 1999) One of thegoals of the present meta-analysis is to have a reliableestimate of the true correlation between standardizedtestndashretest score gains (d) and g Although the constructof g has been thoroughly studied the construct under-lying score gains is less well understood One of theaims of the present study is to have a clearerunderstanding of the construct underlying score gainsby linking it to the g nexus Carrying out a completemeta-analysis on the relationship between d and gwould require the collection of a very large number ofdatasets However applying meta-analytical techniquesto a sufficiently large number of studies will also lead toa reliable estimate of the true correlation between d andg We therefore collected a large number of studiesheterogeneous across various possible moderators

To get a reliable correlation between g and d wefocused on batteries with a minimum of seven subtestsLibraries and test libraries of universities were searchedand several members of the Dutch Testing Commissionand test publishers were contacted We limited ourselvesto non-clinical samples without health problems Only aminority of test manuals report testndashretest studiesespecially before 1970 they are rare The search yieldedvirtually all testndashretest studies available in the Nether-lands The GATB manual (1970 ch 20) reports verylarge datasets on secondary school children who tookthe GATB with respectively 1- 2- and 3-year intervalsAt the time of the first test large samples of children thathad the same age as the testndashretest children at the time ofthe second test also took the test Through a comparisonof the scores the maturation effects could be separatedfrom the testndashretest effects so we included the data inthe present study

Standardized score gains were computed by dividingthe raw score gain by the SD of the pretest In generalg loadings were computed by submitting a correlationmatrix to a principal axis factor analysis and using theloadings of the subtests on the first unrotated factor Insome cases g loadings were taken from studies whereother procedures were followed these procedures havebeen shown empirically to lead to highly comparableresults Pearson correlations between the standardizedscore gains and the g loadings were computed

Psychometric meta-analytical techniques (Hunter ampSchmidt 1990 2004) were applied to the resulting 64rgds using the software package developed by Schmidtand Le (2004) Psychometric meta-analysis is based on

287J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

the principle that there are artifacts in every dataset andthat most of these artifacts can be corrected In thepresent study we corrected for five artifacts that alter thevalue of outcome measures listed by Hunter andSchmidt (1990) (1) sampling error (2) reliability ofthe vector of g loadings (3) reliability of the vector ofscore gains (4) restriction of range of g loadings and(5) deviation from perfect construct validity

81 Correction for sampling error

In many cases sampling error explains the majorityof the variation between studies so the first step in apsychometric meta-analysis is to correct the collectionof effect sizes for differences in sample size between thestudies

82 Correction for reliability of the vector of g loadings

The value of rgd is attenuated by the reliability of thevector of g loadings for a given battery When twosamples have a comparable N the average correlationbetween vectors is an estimate of the reliability of eachvector The collection of datasets in the present studyincluded no g vectors for the same battery from differentsamples and therefore artifact distributions were basedupon other studies reporting g vectors for two or moresamples So the effect sizes and the distribution ofreliabilities of the g vector were based upon differentsamples When two g vectors were compared thecorrelation between them was used and when morethan two g vectors were compared the averagecorrelation for the various combinations of two vectorswas used The combined N from the samples on whichthe g vector was based was taken as the weight of onedata point

Several samples were compared that differed little onbackground variables For the comparisons usingchildren we chose samples that were highly comparablewith regard to age and for the comparisons of adults wechose samples that were roughly comparable withregard to age In a study on young children Schrootsand van Alphen de Veer (1979) report correlationmatrices for the Leidse Diagnostische Test for eight agegroups between 4 and 8 years of age The averagecorrelation between the adjacent age groups is 75(combined N=1169) Several studies report data onboth younger and older children The DutchFlemishWISC-R (van Haasen et al 1986) has samples withcomparable N of Dutch and Flemish children so the 11age groups between 6 and 16 could be compared Thisresulted in an average correlation of 78 (combined

N=3018) Jensen (1985) reports g loadings of the 12subtests of the WISC-R obtained in three largeindependent representative samples of Black andWhite children The average correlation between the gvectors obtained for each sample is 86 for the Blackchildren (combined N=1238) and 93 for the Whitechildren (combined N=2868) In a study on olderchildren Evers and Lucassen (1991) report the correla-tion matrices of the Dutch DAT The average correlationbetween the g vectors of three educational groups is 88(combined N=3300) The US GATB manual (1970chapter 20) gives correlation matrices for large groupsof boys and girls in secondary school The averagecorrelation between the g vectors of the same-age boysand girls is 97 (combined N=26708) Several studiesreport data on adults g loadings of the eight subtests ofthe GATB are reported by te Nijenhuis and van der Flier(1997) for applicants at Dutch Railways and by de Wolffand Buiten (1963) for seamen at the Royal Dutch Navyresulting in a correlation of 90 (combined N=1306)The US GATB manual (1970) gives correlation matricesfor two large groups of adults which yields a correlationbetween g vectors of 94 (combined N=4519) JohnsonBouchard Krueger McGue and Gottesman (2004)report g loadings for a sample that took the WAIS andWechsler (1955) reports the correlation matrices of theWAIS for adults of comparable age so g loadings couldbe computed The correlation between the g vectors forthe two studies is 72 (combined N=736) So it appearsthat g vectors are quite reliable especially when thesamples are very large

The number of tests in the batteries in the presentstudy varied from 7 to 14 The number of tests does notnecessarily influence the size of rgd but clearly has aneffect upon its variability Because variability in thevalues of the artifacts influences the amount of varianceartifacts explain in observed effect sizes we estimatedthis variability using data from the samples described inthe previous paragraph

83 Correction for reliability of the vector of scoregains

The value of rgd is attenuated by the reliability ofthe vector of score gains for a given battery Whentwo samples have a comparable N the average cor-relation between vectors is an estimate of thereliability of each vector The reliability of the vectorof score gains was estimated using the presentdatasets comparing samples that took the same testand that differed little on background variables Forthe comparisons using children we choose samples

288 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

that were highly comparable with regard to age andfor the comparisons of adults we choose samples thatwere roughly comparable with regard to age

In the GATB manual (1970 ch 15) 13 combinationsof two studies are described where large samples of menand women that are comparable with respect to age andbackground took the same GATB subtests The averageunweighted correlation between the d vectors of menand women is 83 (total N=3760) In the GATB manual(1970 ch 20) three combinations of three studies aredescribed where very large samples of boys and girlsthat are in the same grade in secondary school took thesame GATB subtests This yielded correlations betweenthe d vectors of respectively 99 98 and 94 (totalN=20541) Together van Geffen (1972) and Bosch(1973) report three Dutch GATB testndashretest studies onchildren in secondary school resulting in threecomparisons between d vectors The average N-weighted correlation between the d vectors is 47 (totalN=127) Vectors of score gains from two differentdatasets on the WISC-R were compared Tuma andAppelbaum (1980) tested children with an average ageof 10 and Wechsler (1974) tested 10- and 11-year-oldsThe correlation between the two d vectors is 71 (totalN=147) Comparison of vectors of score gains fromdatasets on the DAT (Bennett Seashore amp Wesman1974) resulted in correlations of respectively 78 and73 so an average r of 76 (total N=254) So it appearsthat d vectors are quite reliable especially when thesamples are very large We estimated the reliabilities ofthe d vectors in the database using data from thesamples described in this paragraph

84 Correction for restriction of range of g loadings

The value of rgd is attenuated by the restriction ofrange of g loadings in many of the standard testbatteries The most highly g-loaded batteries tend tohave the smallest range of variation in the subtests gloadings Jensen (1998a pp 381ndash382) shows thatrestriction in g loadedness strongly attenuates thecorrelation between g loadings and standardized groupdifferences Hunter and Schmidt (1990 pp 47ndash49)state that the solution to range variation is to define areference population and express all correlations interms of that reference population The Hunter andSchmidt meta-analytical program computes what thecorrelation in a given population would be if thestandard deviation were the same as in the referencepopulation The standard deviations can be comparedby dividing the study population standard deviation bythe reference group population standard deviation that

is u=SDstudySDref As the reference we took thetests that are broadly regarded as exemplary for themeasurement of the intelligence domain namely thevarious versions of the Wechsler tests for childrenThe average standard deviation of g loadings of thevarious Dutch and US versions of the WISC-R andthe WISC-III was 0128 So the SD of g loadings ofall test batteries was compared to the average SD ing loadings in the Wechsler tests for children Thisresulted in some batteriesndashsuch as the GATBndashhavinga value of u larger than 100

85 Correction for deviation from perfect constructvalidity

The deviation from perfect construct validity in gattenuates the value of rgd In making up any collectionof cognitive tests we do not have a perfectly repre-sentative sample of the entire universe of all possiblecognitive tests So any one limited sample of tests willnot yield exactly the same g as any other limitedsample The sample values of g are affected by psy-chometric sampling error but the fact that g is verysubstantially correlated across different test batteriesimplies that the differing obtained values of g can allbe interpreted as estimates of a ldquotruerdquo g The value ofrgd is attenuated by psychometric sampling error ineach of the batteries from which a g factor has beenextracted

The more tests and the higher their g loadings thehigher the g saturation of the composite score TheWechsler tests have a large number of subtests withquite high g loadings resulting in a highly g-saturatedcomposite score Jensen (1998a pp 90ndash91) states thatthe g score of the Wechsler tests correlate more than 95with the tests IQ score However shorter batteries witha substantial number of tests with lower g loadings willlead to a composite with a somewhat lower g saturationJensen (1998a ch 10) states that the average g loadingof an IQ score as measured by various standard IQ testsis in the +80 s When we take this value as an indicationof the degree to which an IQ score is a reflection ofldquotruerdquo g we can estimate that a tests g score correlatesabout 85 with ldquotruerdquo g As g loadings are thecorrelations of tests with the g score it is most likelythat most empirical g loadings will underestimate ldquotruerdquog loadings so empirical g loadings correlate about 85with ldquotruerdquo g loadings As the Schmidt and Le computerprogram only includes corrections for the first fourartifacts the correction for deviation from perfectconstruct validity was carried out on the value of rgdafter correction for the first four artifacts To limit the

Table 1Dutch British and US studies of correlations between g loadings and gain scores

Reference Test r N Information

Drenth et al (1968) AKIT minus 57 100 Primary-school childrenvan Geffen (1972) GATB minus 45 42 Secondary-school children

minus 21 42Bosch (1973) GATB minus 07 43 Secondary-school childrenSchroots and van Alphen

de Veer (1979)LDT minus 42 96 Pre-school and secondary-school children

Bleichrodt et al (1987) RAKIT 09 49 Pre-school childrenminus 25 51 Primary-school childrenminus 21 49 Primary-school children

van der Doef et al (1989) WISC-R minus 69 22 Primary-school children with learning problemsMulder et al (2004) KAIT minus 23 46 Secondary-school children+young adults

minus 42 25 AdultsKort et al (2005) WISC-III minus 15 42 Primary-school children

minus 26 67 Primary-school childrenminus 46 39 Secondary-school children

Luteijn and Barelds (2005) GIT2 minus 51 44 AdultsKooij et al (2005) WAIS-III minus 63 60 AdultsElliott (1983) BAS minus 65 60 Primary-school childrenWechsler (1967) WPPSI minus 46 50 Pre-school childrenUnited States Department

of Labor (1970)GATB minus 35 156 Office applicants

minus 66 605 Male high school seniorsminus 70 554 Female high school seniorsminus 58 223 Males 1-day intervalminus 41 186 Females 1-day intervalminus 50 202 Males 2-week intervalminus 52 152 Females 2-week intervalminus 67 156 Males 6-week intervalminus 61 168 Females 6-week intervalminus 43 176 Males 13-week interval02 149 Females 13-week interval

minus 62 157 Males 26-week intervalminus 32 136 Females 26-week intervalminus 69 119 Males 1-year intervalminus 31 183 Females 1-year intervalminus 96 118 Males 2-year intervalminus 75 170 Females 2-year intervalminus 75 123 Males 3-year intervalminus 48 183 Females 3-year intervalminus 92 3398 Boys secondary schoolminus 92 3680 Girls secondary schoolminus 91 3348 Boys secondary schoolminus 91 3491 Girls secondary schoolminus 84 3229 Boys secondary schoolminus 87 3395 Girls secondary school

Wechsler (1974) WISC-R minus 48 97 Primary-school childrenminus 66 102 Primary-school childrenminus 21 104 Secondary-school children

Bennett et al (1974) DAT minus 79 92 Boys secondary schoolminus 53 81 Girls secondary schoolminus 29 81 Boys secondary schoolminus 62 100 Girls secondary school

Covin (1977) WISC-R minus 57 30 Primary-school children with learning problemsTuma and Appelbaum (1980) WISC-R minus 08 45 Primary- and secondary-school childrenMatarazzo et al (1980) WAIS minus 10 29 Young malesWechsler (1981) WAIS-R minus 64 71 Adults

minus 48 48 Adults

(continued on next page)

289J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Table 1 (continued)

Reference Test r N Information

McCormick et al (1983) ASVAB minus 73 57 adultsKaufman and Kaufman (1983) K-ABC minus 27 46 Pre-school children

minus 18 36 Pre- and primary-school childrenminus 22 70 Primary-school children

Wechsler (1997) WAIS-III minus 45 100 Young adultsminus 57 102 Adultsminus 51 104 Adults03 88 Adults

Reeve and Lam (2005) EAS minus 34 123 Undergraduate students

In general the g loadings were based on the correlation matrix taken from the manuals containing the testndashretest studies or from the correlation matrixbased on the largest sample size we could find What follows is a list of the sources of the g loading when not taken from the manuals containing thetestndashretest studyvan Geffen (1972) and Bosch (1973) de Wolff and Buiten (1963) see also Johnson te Nijenhuis and Bouchard (in press) Bleichrodt et al (1987) teNijenhuis et al (2004) who used the same data on which the RAKIT manual is based van der Doef Kwint and van der Koppel (1989) DutchWISC-R manual Elliott (1983) Table 98 Age 90ndash911 years US Dept of Labors GATB (1970) Jensen (1985 p 214) using the largestcorrelation matrix in the GATBmanual Wechsler (1974) Covin (1977) and Tuma and Appelbaum (1980) Jensen (1985 p 214 first study) Bennettet al (1974) average of four highly similar correlation matrices Matarazzo et al (1980) Wechslers (1955 p 17) Table 8 for ages 25ndash34McCormick et al (1983) Ree and Carretta (1994) Reeve and Lam (2005) utilize SEM analyses and use item parcels instead of full scale scores tocompute g loadings The average g loading of all the item parcels for a specific subtest was taken as the g loading of that specific subtest

290 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

risk of overcorrection we conservatively chose thevalue of 90 for the correction

9 Results

The results of the studies on the correlation between gloadings and gain scores are shown in Table 1 The tablegives data derived from 64 studies with participantsnumbering a total of 26990 The table gives thereference for the study the cognitive ability test usedthe correlation between g loadings and gain scores thesample size and background information on the study Itis clear that virtually all correlations are negative and thatthe size of the few positive correlations is very small

Table 2 shows the results of the psychometric meta-analysis of the 64 data points It shows (from left toright) the number of correlation coefficients (K) totalsample size (N) the mean observed correlations (r) andtheir standard deviation (SDr) the true correlations onecan expect once artifactual error from unreliability in theg vector and the d vector and range restriction in the gvector has been removed (ρ) and their standarddeviation (SDρ) The next two columns present thepercentage of variance explained by artifactual errors (

Table 2Meta-analysis results for correlations between g loadings and gain scores af

Studies included K N r SD

All 64 26990 minus 80 20All minus 3 outliers 61 26704 minus 81 18

K=number of correlations N=total sample size r=mean observed correlcorrelation ρ=true correlation (observed correlation corrected for unreliabiliVE=percentage of variance accounted for by artifactual errors 95 CI=9

VE) and the 95 credibility interval (95 CI) Thisinterval denotes the values one can expect for ρ in 19out of 20 cases

The large number of data points and the very largesample size indicate that we can have confidence in theoutcomes of this meta-analysis The estimated truecorrelation has a value of minus 95 and 81 of the variancein the observed correlations is explained by artifactualerrors However Hunter and Schmidt (1990) state thatextreme outliers should be left out of the analysesbecause they are most likely the result of errors in thedata They also argue that strong outliers artificiallyinflate the SD of effect sizes and thereby reduce theamount of variance that artifacts can explain We choseto leave out three outliersndashmore than 4 SD below theaverage r and more than 8 SD below ρndashcomprising1 of the research participants This resulted in nochanges in the value of the true correlation a largedecrease in the SD of ρ with 74 and a large increasein the amount of variance explained in the observedcorrelations by artifacts by 22 So when the threeoutliers are excluded artifacts explain virtually all of thevariance in the observed correlations Finally a correc-tion for deviation from perfect construct validity in g

ter corrections for reliability and restriction of range

r ρ SDρ VE 95 CI

minus 95 11 81 minus074 to 116minus 95 03 99 minus091 to 100

ation (sample size weighted) SDr=standard deviation of observedty and range restriction) SDρ=standard deviation of true correlation5 credibility interval

291J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

took place using a conservative value of 90 Thisresulted in a value of minus106 for the final estimated truecorrelation between g loadings and score gainsApplying several corrections in a meta-analysis maylead to correlations that are larger than 100 or minus100 asis the case here Percentages of variance accounted forby artifacts larger than 100 are also not uncommon inpsychometric meta-analysis They also do occur in othermethods of statistical estimation (see Hunter amp Schmidt1990 pp 411ndash414 for a discussion)

10 Discussion

A large-scale meta-analysis of 64 testndashretest studiesshows that after corrections for several artifacts there isan estimated true correlation of minus106 between gloading of tests and score gains and virtually all of thevariance in observed correlations is attributable to theseartifacts As several artifacts explain virtually all thevariance in the effect sizes other dimensions on whichthe studies differ such as age of the test takers testndashretest interval test used average-IQ samples or sampleswith learning problems play no role at all

The estimated true correlation of minus106 is the resultof various corrections for artifacts that attenuate thecorrelations The estimated values of the artifacts mayunderestimate or overestimate the population values ofthe artifacts Therefore estimates of true effect sizesmay overestimate or underestimate the populationvalues of the effect size As a solution to this problemHunter and Schmidt (2004) suggest carrying out severalmeta-analyses on the same construct and taking theaverage estimated effect size of all meta-analyses Thegeneral idea is that meta-analysis is a powerful researchtool but does not give perfect outcomes

A correlation of minus106 falls outside the range ofacceptable values of a correlation but one has to make adistinction between the meta-analytical estimate of thetrue correlation between g and d and the true correlationbetween g and d We interpret the value of minus106 for themeta-analytical estimate as meaning that the truecorrelation between g and d is minus100 A correlation ofminus100 means that there is an inverse relationshipbetween g and score gains So the tests with the highestg loadings show the smallest gains The most straightfor-ward interpretation of this very large negative correlationis that there is no g saturation in testndashretest gain scores

11 The South African learning potential study

In a carefully carried-out study Skuy et al (2002)used a dynamic testing procedure to see whether it

would improve the scores of Black South Africanstudents on Ravens Standard Progressive Matrices(RSPM) The Bantu Education Act of 1954 establisheda discriminatory educational system characterized bypoorly qualified teachers sparsely equipped and fundedschools and generally poor quality Most Black studentsin the sample had not received the same quality ofeducation as White students Black White Indian andColored research participants took the RSPM on twooccasions and in between randomly constitutedexperimental groups were exposed to the MediatedLearning Experience Both the Black South Africangroup and the group consisting of White Indian andColored South Africans improved over their baseline onthe RSPM and the Black group showed greaterimprovement

The value of these cognitive interventions increaseswhen the score gains are transferred to other tests andto external criteria such as school or work achieve-ment Therefore the research participants also tookFeuersteins Representational Stencil Design Test as atransfer measure The subject is presented with astencil of a geometric design and then asked to pointto which stencils need to be used and in whatsequence in order to construct an identical designLike the RSPM the Stencils test also requiresrepresentationalabstract thinking but the training onthe RSPM showed little transfer to it Moreover thecorrelation of the RSPM scores with performance inthe end-of-year psychology examination did notsignificantly improve after mediation Once againthe score gains were empty they did not generalizeSkuy et al go on to ask the question what it is thatwas improved by their interventions Professor Skuymade his data accessible to the present authors so wecould perform additional analyses

12 Sample

The data from Skuy et al (2002) were used with theexception of data from three research participantsbecause their pretest IQ scores were extremely low(more than 3 SDs below the group mean) Ninety-fiveuniversity students in psychology aged 16 to 29 (meanage=20 SD=23 25 males 70 females) participatedin this study They were 66 Black students (20 males 46females) and 29 White (20) Indian (6) and Colored (3)students (5 males 24 females) The mean age of theBlack group was 20 (SD=25) and of the WhiteIndian and Colored group 19 years (SD=1) Subjectswere randomly assigned to the experimental group(n=55) and to the control group (n=40)

292 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

13 Procedure

The students participated in pre- and posttest phaseswith a group intervention in between The study focusedon improvement in scores on the RSPM using the SetVariations II of the Learning Propensity AssessmentDevice as the mediation task Mediation training took3 h and was conducted by three experienced psychol-ogists with the assistance of six postgraduate psychol-ogy students A detailed description is given in Skuy etal (2002)

14 Measures and cognitive intervention

The Ravens Standard Progressive Matrices consistsof 60 items (divided into 5 sets of 12 items) designedto measure the ability to form comparisons to reasonby analogy and to organize spatial information intorelated wholes It has been established as one of thepurest measures of g (Jensen 1998a) Skuy et al(2002) found no evidence for test bias against Blacksin South African education Rushton Skuy and Bons(2004) showed that the Ravens gave comparablepredictive validities for students from various groupsCross-cultural testing research has clearly shown thatunsufficient proficiency in the language of the testcan lead to biased assessments in tests with a strongverbal component However the Ravens is a non-verbal test

The Learning Propensity Assessment Device con-sists of 14 exercises Each exercise contains an initialmediation task Subsequent tasks increase in complex-ity and novelty and aim to assist the learner toachieve mastery over the task The purpose ofmediation is to assist the learner to develop theappropriate cognitive strategies and functions neededfor the successful completion of the task The SetVariations II of the Learning Propensity AssessmentDevice consists of five sets of items which comprisevariations of Sets C D and E of the RSPM test Eachset of variations contains a learning task for thepurpose of initial mediation followed by a series ofprogressively more difficult variations to which theskills learned must be applied Mediation involvesdiscussing with groups how to define the problem tobe solved focus on the task set rules regulateproblem solving behavior and identify the correctsequence of logical steps needed to solve the taskMediation also involves helping the subject todevelop appropriate concepts verbal tools andinsights in relation to the task A detailed descriptionis given in Skuy et al (2002)

15 Statistical analyses

Although the Skuy et al study is among the SouthAfrican learning potential studies with the largestsample size the N is not large We therefore chosebasic statistical analyses

151 Descriptive statistics

Means standard deviations and reliabilities werecomputed for the various groups With regard tomeasures of effect size Hunter and Schmidt (1990 p271) advise choosing estimates of variance with the leasterror Because repeated test takings tend to change thesize of the SD (Ackerman 1987) we chose the SD ofthe pretests for the denominator The correlationbetween scores before and after the training wascomputed to see whether the training had an effect onthe rank order of individuals scores

152 Correlation between score gains and g loadedness

Because our sample was not large and quite specificestimates of g loadedness were taken from Lynn Allikand Irwings (2004) item analysis of RSPM in Estoniausing a large (N=2735) nationally representativesample The same reasoning as in psychometric meta-analysis applies namely that larger samples give betterestimates of g loadings than smaller samples In ahierarchical factor analysis of the items using structuralequations modeling Lynn et al computed g loadings of52 of the 60 items In the present study Pearsoncorrelations were calculated between the g loadings ofthese 52 items and the effect sizes on these items

153 g loadings

The RSPM consists of dichotomous items so wecomputed a correlation matrix of polychoric correlations(Nunnally amp Bernstein 1994) A principal axis factoranalysis was carried out The percentage varianceexplained by the first unrotated factor was taken as anestimate of g loadedness Because sample size waslimited we collapsed the experimental and the controlgroup

154 Correlation between sum scores and score gains

We tested whether individuals with low-g improvedtheir scores more than those with high-g by correlatinggain scores with pretest RSPM scores for each of thefour research groups As gain scores tend to be

Table 3Proportion of sample selecting the correct answer on items of Ravens Standard Progressive Matrices by group

Set A Set B Set C Set D Set E

Item Black Other a Item Black Other Item Black Other Item Black Other Item Black Other

1 100 100 13 100 100 25 100 97 37 100 100 49 74 902 97 100 14 100 100 26 96 100 38 99 100 50 64 903 97 100 15 100 100 27 96 100 39 89 100 51 79 974 100 97 16 91 97 28 86 93 40 92 100 52 56 835 100 100 17 96 97 29 94 97 41 96 100 53 52 836 99 100 18 85 100 30 76 83 42 92 100 54 35 767 94 97 19 77 66 31 88 97 43 77 100 55 42 798 91 93 20 79 97 32 50 79 44 76 93 56 21 699 100 97 21 83 97 33 74 90 45 71 97 57 30 4110 91 97 22 92 100 34 61 79 46 79 93 58 12 4111 83 90 23 80 90 35 53 69 47 29 41 59 02 1712 68 83 24 59 83 36 06 35 48 26 38 60 11 21a Other=White Indian and Colored

293J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

negatively correlated with pretest scores as a function ofunreliability (see Cronbach 1990 Nunnally amp Bern-stein 1994) we corrected the correlations using TuckerDamarin and Messicks (1966) formula 63 Using theformula one adds to each correlation the term (SDpretestSD gain score) (1minus reliability pretest)

16 Results

161 Descriptive statistics

Internal consistencies (Cronbach αs) on the RSPMranged from 76 to 86 for the pre- and posttestsrespectively Table 3 shows the proportion of each of thegroups which selected the correct answer on each of the60 items of the pretest Across the 60 items the order ofthe p values was almost identical for Blacks and WhiteIndianColoreds (r=92 p=00)

Table 4 shows the means and standard deviations forthe total RSPM scores for the four groups along withthe d effect sizes representing the difference betweenpre- and posttest scores (Cohen 1988) First we

Table 4Pre- and posttest mean ravens scores standard deviations and mean effect

Black experimental(n=40)

Black control (n=26)

Pretest Posttest Pretest Postte

Raw scoresM 4378 5010 4546 4835SD 664 531 669 671Percentile 14 41 16 31Effect size 095 043

Percentiles are based on US adult norms see Raven Raven and Courts (2a Other=White Indian and Colored

examined whether there was an effect of race (Blackvs WhiteIndianColored) and group (experimental vscontrol) on the pretest scores There was a significanteffect due to race (F(1 91)=2413 p=00 η2 = 21)but not group (F(1 91)=228 p= 14 η2 = 02) Thismeans that mean pretest scores of Blacks (M=4444 SD=665) were lower than those of WhiteIndianColoreds (M=5141 SD=505) and that mean pretestscores of experimental and control groups werecomparable (M=4553 SD=704 and M=48 SD=67 respectively)

Secondly we investigated the effects of training onthe posttest scores by performing a two-way ANCOVAon the total posttest scores with race and group as factorsand the total pretest scores as the covariate There was asignificant effect for group (F(1 95)=1381 p=00η2 = 13) and for race (F(1 90)=399 p=05 η2 = 04)but not for the two-way interaction of group and race (F(1 90)=028 p= 60 η2 = 00) These results indicatethat the training was equally effective for both the Blackand WhiteIndianColored students Posttest scores ofBlacks (M=4941 SD=591) however remained

sizes for Black and WhiteIndianColored students

Other a experimental(n=15)

Other control (n=14)

st Pretest Posttest Pretest Posttest

5020 5580 5271 5536605 376 345 34341 75 55 68093 077

000) Table SPM13

294 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

significantly lower (F(1 91)=2833 p=00) than thoseof WhitesIndiansColoreds (M=5559 SD=355)Although posttest scores of the experimental group(M=5165 SD=553) were higher than those of thecontrol group (M=508 SD = 665) differencesbetween both groups were nonsignificant (F(1 91)=085 p=36)

The correlation between scores before and after thetraining was 84 (p=00) for the experimental group and90 (p=00) for the control group showing that thetraining had only limited effect on the rank order ofindividuals scores This means that the test strongly butnot perfectly measures the same constructs on bothoccasions

162 Correlation between score gains and g loadedness

We estimated effect sizes for each of the four groups(race by condition) by computing the differencebetween mean pretest scores and posttest scores dividedby the standard deviation of the pretest scores of Blackand WhiteIndianColored students respectivelyFinally we calculated the correlations between effectsizes and the g loadings taken from Lynn et alCorrelations were minus 24 (p=10) for the Black experi-mental group minus 21 (p=20) for the WhiteIndianColored experimental group minus 08 (p=59) for theBlack control group and minus 41 (p=01) for the WhiteIndianColored control group Small sample sizesusually attenuate correlations (Hunter amp Schmidt1990) Collapsing the groups indeed resulted in higheraverage correlations minus 39 for the complete experimen-tal group and minus 26 for the complete control group

163 g loadings

Using the combined experimental and controlgroup a principle axis factor analysis on the pretestand posttest scores respectively resulted in a firstunrotated factor explaining 22 of the variance in thepretest scores and 18 of the variance in the posttestscores These findings suggest that the g loadedness ofthe RSPM decreased substantially after MediatedLearning Experience

164 Correlation between score gains and sum score

Correlating score gains with RSPM total scoresresulted in values of minus 60 (p=00) for the Blackexperimental group minus 18 (p=38) for the Black controlgroup minus 82 (p= 00) for the WhiteIndianColoredexperimental group and minus 48 (p=08) for the White

IndianColored control group After the use of thecorrection formula of Tucker et al (1966) thesecorrelations became minus 39 minus 08 minus 61 and minus 35respectively Overall these correlations show that low-g persons improved their scores more strongly thanhigh-g persons

17 Discussion

Skuy et al (2002) hypothesized that the low-qualityeducation of Blacks in South Africa would lead to anunderestimate of their cognitive abilities by IQ testsGroups of Black and WhiteIndianColored studentstook the Ravens Progressive Matrices twice and inbetween received Feuersteins Mediated LearningExperience The test scores went up substantially in allgroups Evidence for an authentic change in the g factorrequires broad transfer or generalizability across a widevariety of cognitive performance However Skuy et alshow that the gains did not generalize to scores on another highly similar test and to external criteria andwere therefore hollow As the score gains were in somecases quite largendash14 IQ points for the Black experi-mental groupndashthe question becomes what is it thatimproved

The findings show that the correlations betweenscore gains and g loadedness of the items were minus 39 forthe complete experimental group and minus 26 for thecomplete control group However because the gloadings and gain scores are measured at the itemlevel their reliabilities are not high resulting insubstantial attenuation of the correlation between gand d Moreover RSPM does not measure g perfectlyJensen (1998a p 91) estimates its g loading at 83When we estimate the reliability of the g vector at 70and the reliability of the gain score vector at 50corrections for unreliability and deviation from perfectconstruct validity of g only would result in estimatedtrue correlations of respectively minus 80 and minus 53 Thesevalues should be taken as underestimates controllingfor additional artifacts will bring them closer to the verystrong negative correlation found in the meta-analysis

The findings suggest that after training the gloadedness of the test decreased substantially Wefound negative substantial correlations between gainscores and RSPM total scores Table 4 shows that thetotal score variance decreased after training which is inline with low-g subjects increasing more than high-gsubjects Since as a rule high-g individuals profit themost from trainingndashas is reflected in the ubiquitouspositive correlation between IQ scores and trainingperformance (Jensen 1980 Schmidt amp Hunter 1998)ndash

295J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

these findings could be interpreted as an indication thatFeuersteins Mediated Learning Experience is not g-loaded in contrast with regular trainings that are clearlyg-loaded Substantial negative correlations betweengain scores and RSPM total scores are no definite proofof this hypothesis but are in line with it Additionalsubstantiation of our hypothesis that the Feuersteintraining has no or little g loadedness is that Coyle (2006)showed that gain scores loaded virtually zero on the gfactor Moreover Skuy et al reported that the predictivevalidity of their measure did not increase when thesecond Raven score was used The fact that individualswith low-g gained more than those with high-g could beinterpreted as an indication that the Mediated LearningExperience was not g-loaded It should be notedhowever that Feuerstein most likely did not intend hisintervention to be g-loaded He was interested inincreasing the performance of low scorers on bothtests and external criteria

18 General discussion

IQ scores are by far the best general predictor ofsuccess in education job training and work Howeverthere are many ways in which these IQ scores can beincreased for instance by means of retesting orparticipating in a learning potential training programWhat conclusions can be drawn from such score gainsJensens (1998a) hypothesis that the effects of trainingon abilities can be summarized in terms of Carrollsthree-stratum hierarchical factor model was tested in ameta-analysis on testndashretest data using Dutch Britishand American test batteries and with learning potentialdata from South Africa using Ravens ProgressiveMatrices The meta-analysis convincingly shows thattestndashretest score gains are not g-loaded The findingsfrom the learning potential study are clearly in line withthis when the attenuation caused by unreliability andother artifacts is taken into account the correlationbetween g loadings of items and gains on items has avalue that is somewhat comparable to the one found inthe meta-analysis for test batteries The data suggest thatthe g loadedness of item scores decreases after theintervention training Te Nijenhuis et als (2001)finding that practice and coaching reduced the g-loadedness of their test scores strengthens the presentfindings using item scores The findings show that notthe high-g participants increase their scores the mostndashasis common in training situationsndashbut it is the low-gpersons showing the largest increases of their scoresThis suggests that the intervention training is not g-loaded

Our findings fit quite well with the hierarchical modelof intelligence The generalizability of test scores residespredominantly in the g component whereas the test-specific ability component and the narrow abilitycomponent are virtually non-generalizable This is forinstance evidenced by the earlier finding that addingverbal tests to a g score or numerical tests to a g scoreresulted in only a very small incremental validity (Ree ampEarles 1991 Ree et al 1994) Additionally Ericssonand Lehmann (1996) reported immense gains for amemory task focusing on one narrow ability but did notfind any improvement for comparable memory tasksfocusing on another narrow ability As the score gains arenot related to g the generalizable g componentdecreases and since it is not unlikely that the Feuersteintraining itself is not g-loaded it is easy to understand whythe score gains did not generalize to scores on thecognitively loaded Representational Stencil Design TestFor a similar reason the score gains did not generalize tog-loaded external criteria as the correlation of the RSPMscores with performance in the end-of-year psychologyexamination did not significantly improve after media-tion Reeve and Lam (2005) claimed that retesting doesnot change the nature of what is being tested but ourfindings suggest the opposite

19 Limitations of the studies

Our meta-analysis and our analysis of the SouthAfrican study are strongly based on the method ofcorrelated vectors (MCV) and recently it has been shownto have limitations Dolan and Lubke (2001) have shownthat when comparing groups substantial positive vectorcorrelations can still be obtained even when groups differnot only on g but also on factors uncorrelated with gAshton and Lee (2005) show that associations of avariable with non-g sources of variance can produce avector correlation of zero even when the variable isstrongly associated with g They suggest that the gloadings of a subtest are sensitive to the nature of the othersubtest in a battery so that a specific sample of subtestsmay cause a spurious correlation between the vectorsNotwithstanding these limitations studies using MCVcontinue to appear (see for instance Colom Haier ampJung in pressHartmannKruuseampNyborg in press Leeet al 2006) The outcomes of our meta-analysis of a largenumber of studies using the method of correlated vectorsmay make an interesting contribution to the discussion onthe limitations of the method of correlated vectors

A principle of meta-analysis is that the amount ofinformation contained in one individual study is quitemodest Therefore one should carry out an analysis of

296 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

all studies on one topic and correct for artifacts leadingto a strong increase of the amount of information Thefact that our meta-analytical value of r=minus106 isvirtually identical to the theoretically expected correla-tion between g and d of minus100 holds some promise thata psychometric meta-analysis of studies using MCV is apowerful way of reducing some of the limitations ofMCV An alternative methodological approach is tolimit oneself to the rare datasets enabling the use ofstructural equations modeling However from a meta-analytical point of view these studies yield only a quitemodest amount of information

Additional meta-analyses of studies employing MCVare necessary to establish the validity of the combinationof MCV and psychometric meta-analysis Most likelymany would agree that a high positive meta-analyticalcorrelation between measures of g and measures ofanother construct implies that g plays a major role andthat a meta-analytical correlation of minus100 implies that gplays no role However it is not clear what value of themeta-analytical correlation to expect from MCV when gplays only a modest role After the present meta-analysison a construct that clearly has an inverse relationshipwith g it would be informative to carry out meta-analyses of studies on variables that are strongly linkedto g and variables that are modestly linked to g Anexample of the latter would be secular score gainswhich according to Lynns (1990) nutrition theoryshould be modestly g-loaded

The sample sizes in the South African study are notlarge but still larger than those in many other studies oflearning potential where an Nasymp10 is not unusual Theresults of a reanalysis of the many existing studies ondynamic testing could lead to a meta-analysis with alarge combined N The mean posttest score was quitehigh so a ceiling effect may have taken place for theWhiteIndianColored group leading to an underestima-tion of the experimental score gain for this group

Instead of testing the hypothesis with a stronglyunidimensional test such as the RSPM it would be betterto use a multidimensional test Moreover a large samplesize would allow the use of more rigorous data-analyticaltechniques leading to more definitive results Howeverto the best of our knowledge datasets meeting theserequirements do not exist and the Skuy et al study isarguably the best South African learning potential study

20 Score gains as low-quality measuresof motivation

As criterion-related validity is strongly dependent ong te Nijenhuis et als finding of lowered g loadings

after training should result in lowered criterion-relatedvalidity However the empirical findings show theopposite virtually all testndashretest and test preparationstudies on cognitive tests and scholastic aptitude tests thatreported both criterion-related validities demonstratesmall to modest increases in criterion-related validity forthe second or third test score (see Allalouf amp Ben-Shakhar 1998 Bashi 1976 Coyle 2006 HausknechtTrevor amp Farr 2002 Jones 1986 Linn 1977 Olsen ampSchrader 1959 Ortar 1960 Powers 1985 Reeve ampLam 2005) In the carefully designed study by Allaloufand Ben-Shakhar (1998) of a university entrance test theexperimental group received an intensive 40-h testcoaching program while the control group did not Thecriterion-related validity for the retest increased for bothgroups Most importantly the increase was the samemdashitwas not larger for the experimental group

In a little-known but carefully designed large-scalelearning potential study by Resing (1990 see Table423) she compared an experimental group thatreceived a pretest a learning potential training and aposttest against a control group that received only thepretest and the posttest The mean criterion-relatedvalidity of the various second scores was 62 for both theexperimental and the control group Learning potentialtraining did not result in incremental criterion-relatedvalidity over and above the validity resulting fromsimply retesting The findings from both Resing andAllalouf and Ben-Shakhar suggest that cognitiveinterventions do not increase criterion-related validitymore than simple retesting

g and the personality measure conscientiousness havebeen shown to make an excellent combination ofpredictors (Schmidt amp Hunter 1998) Conscientiousnessrepresents among other characteristics persistence a willto achieve and the ability to focus effort on the goal Afield study on test preparation using actual job applicants(Clause Delbridge Schmitt Chan amp Jennings 2001)showed that motivation to perform well on the testcorrelated 25 with test performance One could speculatethat score increases do not reflect a true cognitivecomponent but rather become low-quality measures ofmotivation Further since the increase in validity due toretesting and learning potential training is modest incomparison to the large increase obtainable from the useof personality questionnaires personality testing mightprovide a less expensive and more accurate alternative

21 Effectiveness of various training formats

Components of the mediation training used by Skuyet al (2002) are similar to the test training used in te

297J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Nijenhuis et al (2001) Both the Dutch training and theSouth African training took 3 h but whereas in theDutch training the focus was on two different testformats the South African training dealt only with onetest format The test training by Lloyd and Pidgeon(1961) took even less time namely two half-hoursegments each focusing on one test format The effectsizes in all studies were roughly comparable Thissuggests that the methodologies employed by teNijenhuis et al and Lloyd and Pidgeon were moreefficient than those used by Skuy et al It is possible thatthe components of the mediation training that are notpresent in the other two training formats are not effectivein raising test scores and could therefore be left out Iftrue it might be possible to increase the scores on theRSPM by one SD with a relatively simple 1-h training

22 Generalizability of findings

Can these findings of hollow score gains after testndashretest test practice and Mediated Learning ExperienceTraining be generalized to other studies where training-induced score gains were found Ericsson and Lehmann(1996) reported tremendous score increases afterintensive training on numeric memory tests but thesegains did not generalize in the least to verbal memorytests Such gains on one narrow ability do not generalizeto another narrow ability clustering under the samebroad ability and are therefore hollow Similarly Jensen(1998b) showed that score gains due to adoption werenot on the g factor and were therefore most likelyhollow

Rushton (1999) argued that intergenerational scoregains are not linked to g suggesting the Flynn effectsmay be empty but he was strongly criticized by Flynn(1999 2000) In studies on the Flynn effect score gainsfound in cross-sectional studies are largest on the RSPM(Flynn 1987) It has been suggested by Lynn (1998) thata substantial part of these intergenerational score gainson the RSPM are generalizablendashthey do reflect highergndashbut the remaining part is hollow and should beinterpreted as schooling effects The RSPM does requirethe application of the mathematical principles ofaddition subtraction progression and the distributionof values In the three decades (1950sndash1980s) overwhich these increases in RSPM scores have occurredincreasing proportions of 15- to 18-year-olds haveremained in schools where they have learned mathskills that they have applied to the solution of matricesproblems Our findings could be interpreted as supportfor Lynns hypothesis of the partial hollowness of scoregains on the RSPM Notwithstanding the high g loading

of the sum score of the RSPM it is quite sensitive totestndashretest effects and training effects Some studies onthe Flynn effect (Lynn amp Hampson 1986 Teasdale ampOwen 1989) show that the increase in scores is largelyconcentrated in the lower segments of the IQ distribu-tion Our finding that low scorers show the largest gainsafter training may additionally support the notion that apart of the Flynn effect on the RSPM is hollow FinallyWicherts et als (2004) findings show that in some oftheir datasets the secular score gains are most stronglylinked to broad- narrow- and test-specific abilitiesshowing that an important part of the gains are non-generalizable

Ceci (1991) showed that increased schooling leads tohigher IQ scores but are these gains highly specific orpredominantly generalizable It would be interesting toapply the techniques we used in this study to thefindings from previous intervention studies It may bethat biological interventions (such as diet vitaminsupplements vaccination against infectious disease)rather than psychological or educational interventionsare the most cost-effective method of producing truechanges in g and broad abilities It may be that there is abiological barrier between the first stratum and thesecond stratum that restricts the effects of behavioralinterventions to narrow abilities and test specificities

Acknowledgement

We like to thank Mervyn Skuy for his permission touse his dataset

Thanks to Marieacute de Beer Raegan Murphy WelkoTomic Art Jensen and Frank Schmidt for feedback onprevious versions of this paper

Thanks to Arne Evers Wilma Resing (Dutch TestCommittee) and Andress Kooij (Harcourt) for alsohelping in locating testndashretest studies

References

Ackerman P L (1986) Individual differences in informationprocessing An investigation of intellectual abilities Intelligence10 101minus139

Ackerman P L (1987) Individual differences in skill learning Anintegration of psychometric and information processing skillsPsychological Bulletin 102 3minus27

Allalouf A amp Ben-Shakhar G (1998) The effect of coaching on thepredictive validity of scholastic aptitude tests Journal ofEducational Measurement 35(1) 31minus47

Ashton M C amp Lee K (2005) Problems with the method ofcorrelated vectors Intelligence 33 431minus444

Bashi Y (1976) Verbal and non-verbal abilities of 4th 6th and 8thgrade students in the Arab educational system in Israel JerusalemHebrew University School of Education

298 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Bleichrodt N Resing W C M Drenth P J D amp Zaal J N (1987)Intelligentie-meting bij kinderen Empirische en methodologischeverantwoording van de geReviseerde Amsterdamse Kinder Intelli-gentie Test [Measuring the intelligence of children Empirical andmethodological justification of the Revised Amsterdam ChildrenIntelligence Test] Lisse the Netherlands Swets

Bennett G K Seashore H G ampWesman A G (1974)DifferentialAptitude Tests (5th ed) Manual New York The PsychologicalCorporation

Boeyens J C A (1989) Learning potential An empiricalinvestigation Pretoria South Africa Human Science ResearchCouncil

Bosch F (1973) Inventarisatie beschrijving en onderzoek mbt dewijzigingen van de GATB incl test-hertest onderzoek (NoPz3bRp0120) [Stock-taking description and research concern-ing the modifications of the GATB includes testndashretest study]Utrecht the Netherlands Nederlandse Spoorwegen

Carroll J B (1993) Human cognitive abilities A survey of factoranalysis studies Cambridge University Press

Ceci S J (1991) How much does schooling influence generalintelligence and its cognitive components A reassessment of theevidence Developmental Psychology 27 703minus722

Christian K Bachnan H J amp Morrison F J (2001) Schooling andcognitive development In R J Sternberg amp E L Grigorenko(Eds) Environmental effects on cognitive abilities (pp 287minus335)Mahwah NJ Erlbaum

Clause C S Delbridge K Schmitt N Chan D amp Jennings D(2001) Test preparation activities and employment test perfor-mance Human Performance 14 149minus167

Cohen J (1988) Statistical power analysis for the behavioralsciences Hillsdale Lawrence Erlbaum

Colom R Jung R E amp Haier R J (in press) Finding the g-factor inbrain structure using the method of correlated vectors Intelligence

Covin T A (1977) Stability of the WISC-R for 9-year-olds withlearning difficulties Psychological Reports 40 1297minus1298

Coyle T R (2006) Testndashretest changes on scholastic aptitude tests arenot related to g Intelligence 34 15minus27

Cronbach L J (1990) Essentials of psychological testing New YorkHarperCollins

de Villiers AB (1999) Disadvantaged students academic perfor-mance Analysing the zone proximal developmentUnpublished DPhil thesis University of Cape Town South Africa

de Wolff C J amp Buiten B (1963) Een factoranalyse van viertestbatterijen [A factor analysis of four test batteries] NederlandsTijdschrift Voor Psychologie 18 220minus239

Dolan C V amp Lubke G (2001) Viewing Spearmans hypothesisfrom the perspective of multigroup PCA A comment onSchonemanns criticism Intelligence 29 231minus245

Drenth P J D Petrie J F amp Bleichrodt N (1968) Handleiding bijde Amsterdamse Kinder Intelligentie Test [Manual of theAmsterdam Children Intelligence Test] Amsterdam VrijeUniversiteit

Elliott C D (1983) British Ability Scales Manual 2 TechnicalHandbook Windsor Great-Britain NFER-Nelson

Engelbrecht M (1999) Leerpotensiaal as voorspeller van akademi-ese sukses van universiteitsstudente [Learning potential aspredictor of the academic success of university students]Unpublished D Phil thesis Potchefstroom University forChristian Higher Education South Africa

Ericsson K A amp Lehmann A C (1996) Expert and exceptionalperformance Evidence of maximal adaptation to task constraintsAnnual Review of Psychology 47 273minus305

Evers A amp Lucassen W (1991) Handleiding DAT 83 DifferentieumlleAanleg Testserie [Manual DAT83 Differential Aptitude Testseries] Amsterdam Swets

Fleishman E A amp Hempel W E (1955) The relation betweenabilities and improvement with practice in a visual discriminationreaction task Journal of Experimental Psychology 49 301minus312

Flynn J R (1987) Massive IQ gains in 14 nations What IQ testsreally measure Psychological Bulletin 101 171minus191

Flynn J R (1999) Evidence against Rushton The genetic loading ofWISC-R subtests and the causes of between-group IQ differencesPersonality and Individual Differences 26 373minus379

Flynn J R (2000) IQ gains WISC subtests and fluid g g theory andthe relevance of Spearmans hypothesis to race In G R B JGoode (Ed) The nature of intelligence (pp 202minus227) New YorkWiley

Gaydon VP (1988) Predictors of performance of disadvantagedadolescents on the SowetoAlexandra gifted child programmeUnpublished M Ed dissertation University of the WitwatersrandSouth Africa

Gottfredson L S (1997) Why g matters The complexity of everydaylife Intelligence 24(1) 79minus132

Gottfredson L S (2002) g Highly general and highly practical In RJ Sternberg amp E L Grigorenko (Eds) The general intelligencefactor How general is it (pp 331minus380) Mahwah NJ Erlbaum

Grigorenko E L amp Sternberg R J (1998) Dynamic testing Psy-chological Bulletin 124 75minus111

HaeckW Yeld N Conradie J Robertson N amp Shall A (1997) Adevelopmental approach to mathematics testing for universityadmissions and course placement Educational Studies in Mathe-matics 33 71minus91

Hartmann P Kruuse NHS amp Nyborg H (in press) Testing thecross-racial generality of Spearmans hypothesis in two samplesIntelligence

Hausknecht J P Trevor C O amp Farr J L (2002) Retaking abilitytests in a selection setting Implications for practice effects trainingperformance and turnover Journal of Applied Psychology 87(2)243minus254

Hunter J E amp Schmidt F L (1990) Methods of meta-analysisLondon Sage

Hunter J E amp Schmidt F L (2004) Methods of meta-analysis (2nded) London Sage

Jensen A R (1980) Bias in mental testing London MethuenJensen A R (1985) The nature of the blackndashwhite difference on

various psychometric tests Spearmans hypothesis Behavioraland Brain Sciences 8 193minus263

Jensen A R (1998a) The g factor The science of mental abilityLondon Praeger

Jensen A R (1998b) Adoption data and two g-related hypothesesIntelligence 25 1minus6

Johnson W Bouchard T J Krueger R F Jr McGue M ampGottesman I I (2004) Just one g Consistent results from threetest batteries Intelligence 32 95minus107

Johnson W te Nijenhuis J amp Bouchard TJ Jr (in press)Replication of the hierarchical visual-perceptual-image rotationmodel in de Wolff and Buitens (1963) battery of 46 tests of mentalability Intelligence

Jones R J (1986) A comparison of the predictive validity of theMCAT for coached and uncoached students Journal of MedicalEducation 61 335minus338

Kaufman A S amp Kaufman N L (1983) K-ABC KaufmanAssessment Battery for Children Interpretive manual CirclePines MN AGS

299J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Kooij A P Rolfhus E Wilkins C Yang Z amp Zhu J (2005)WAIS-III Nederlandstalige bewerking Technisch rapport hernor-mering [WAIS-III adoptation in Dutch Technical report renorm-ing] Amsterdam Harcourt

Kort W Schittekatte M Dekker P H Verhaeghe P Compaan EL Bosmans M amp Vermeir G (2005) WISC-IIINL WechslerIntelligence Scale for Children Derde Editie NL Handleiding enverantwoording [The Dutch WISC-III Wechsler Intelligence Scalefor Children Third Edition for the Netherlands Manual andjustification] Amsterdam NIP

Kulik J A Bangert-Drowns R L amp Kulik C C (1984)Effectiveness of coaching for aptitude tests PsychologicalBulletin 95 179minus188

Kulik J A Kulik C C amp Bangert R L (1984) Effects of practiceon aptitude and achievement test scores American EducationalResearch Journal 21 435minus447

Lee K H Choi Y Y Gray J R Cho S H Chae J -H Lee S etal (2006) Neural correlates of superior intelligence Strongerrecruitment of posterior parietal cortex NeuroImage 29(2)578minus586

Linn R L (1977) On the treatment of multiple scores for LawSchool Admission Test repeaters (Report LSAC-77-4) In LawSchool Admission Council Reports of LSAC Sponsored ResearchVolume III 1975-1977 Princeton NJ Law School AdmissionCouncil

Lipson LE (1992) Relationship of static and dynamic measures toscholastic achievement of black pupils Unpublished MEddissertation University of Witwatersrand South Africa

Lloyd F amp Pidgeon D A (1961) An investigation into the effects ofcoaching on non-verbal test material with European Indian andAfrican children British Journal of Educational Psychology 31145minus151

Luteijn F amp Barelds D P H (2005) GIT2 Groninger IntelligentieTest 2 [GIT2 Groningen Intelligence Test 2] Amsterdam Harcourt

Lynn R (1990) The role of nutrition in secular increases inintelligence Personality and Individual Differences 11 273minus285

Lynn R (1998) In support of the nutrition theory In U Neisser(Ed) The rising curve Long-term gains in IQ and relatedmeasures (pp 207minus215) Washington DC American Psycholo-gical Association

Lynn R Allik J amp Irwing P (2004) Sex differences on three factorsidentified in Ravens Standard Progressive Matrices Intelligence32 411minus424

Lynn R amp Hampson S (1986) The rise of national intelligenceEvidence from Britain Japan and the USA Personality andIndividual Differences 7 23minus32

Matarazzo J D Carmody T P amp Jacobs L D (1980) Testndashretestreliability and stability of the WAIS A literature review withimplications for clinical practice Journal of Clinical Neuropsy-chology 2(2) 89minus105

McCormick BK Dunlap WP Kennedy RS amp Jones MB(1983) The effects of practice on the Armed Forces VocationalAptitude Test Battery US Army Research Institute for theBehavioral and Social Sciences Technical Report 602

Mulder J L Dekker R amp Dekker P H (2004) KaufmanIntelligentietest voor adolesecenten en volwassenen (KAIT)Handleiding [Kaufman Intelligence test for adolescents and adults(KAIT) Manual] Leiden the Netherlands PITS

Murphy R (2002) A review of South African research in the fieldof dynamic assessment Unpublished MA dissertation Universityof Pretoria (available online from httpupetdupaczathesisavailableetd-05042002-161239)

Nel A (1997)Die voorspelling van akademiese sukses binne kontekstvan n alternatiewe universiteitstoelatingsbeleid [The predictionof academic success within the context of an alternative policy ofuniversity admission] Unpublished MA dissertation RandAfrikaans University South Africa

Neubauer A C amp Freudenthaler H H (1994) Reaction time in asentence-picture verification test and intelligence Individualstrategies and effects of extended practice Intelligence 19193minus218

Nunnally J C amp Bernstein I H (1994) Psychometric theory(3rd ed) New York McGraw-Hill

Olsen M amp Schrader W B (1959) The use of preliminary and finalScholastic Aptitude Test scores in predicting college grades(College Entrance Examination Board Research and DevelopmentReports and Statistical Reports 59-19 Princeton NJ Educa-tional Testing Service

Ortar G R (1960) Improving test validity by coaching EducationalResearch 2 137minus142

Powers D E (1985) Effects of test preparation on the validity ofGraduate Admission Test Applied Psychological Measurement 9179minus190

Raven J Raven J C amp Court J H (2000) Standard ProgressiveMatrices Raven manual Section 3 Oxford Psychologists Press

Ree M J amp Carretta T R (1994) The correlation of generalcognitive ability and psychomotor tracking tests InternationalJournal of Selection and Assessment 2 209minus216

Ree M J amp Earles A A (1991) Predicting training success Notmuch more than g Personnel Psychology 44 321minus332

Ree M J Earles J A amp Teachout M S (1994) Predicting jobperformance Not much more than g Journal of AppliedPsychology 79 518minus524

Reeve C L amp Lam H (2005) The psychometric paradox of practiceeffects due to retesting Measurement invariance and stable abilityestimates in the face of observed score changes Intelligence 33535minus549

Resing W C M (1990) Intelligentie en leerpotentieel Eenonderzoek naar het leerpotentieel van jonge leerlingen uit hetbasis-en speciaal onderwijs [Intelligence and learning potential Astudy into the learning potential of young students in basic andspecial education] Amsterdam the Netherlands Swets

Rushton J P (1999) Secular gains in IQ are not related to the g factorand inbreeding depressionmdashunlike blackndashwhite differences A replyto Flynn Personality and Individual Differences 26 381minus389

Rushton J P Skuy M amp Bons T A (2004) Construct validity ofRavens Advanced Progressive Matrices for African and non-African engineering students in South Africa InternationalJournal of Selection and Assessment 12(3) 220minus229

Schmidt F L amp Hunter J E (1998) The validity and utility ofselection methods in personnel psychology Practical and theore-tical implications of 85 years of research findings PsychologicalBulletin 124(2) 262minus274

Schmidt F L amp Hunter J E (1999) Theory testing andmeasurement error Intelligence 27(3) 183minus198

Schmidt F L amp Le H (2004) Software for the Hunter-Schmidtmeta-analysis methods University of Iowa Department ofManagement and Organization IOWA City IQ 42242

Schroots J J F amp van Alphen de Veer R J (1979) LDT LeidseDiagnostische Test Deel 1 Handleiding [LDT Leiden DiagnosticTest Part 1 Manual] Lisse the Netherlands Swets

Shochet I M (1986) Manifest and potential performance inadvantaged and disadvantaged students Unpublished DPhildissertation University of the Witwatersrand South Africa

300 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Skuy M Gewer A Osrin Y Khunou D Fridjon P amp RushtonJ P (2002) Effects of mediated learning experience on RavensMatrices scores of African and non-African university studentsin South Africa Intelligence 30 221minus232

Swanson H E amp Lussier C M (2001) A selective synthesis of theexperimental literature on dynamic assessment Review of Educa-tional Research 71 321minus363

Teasdale T W amp Owen D R (1989) Continuing secular increase inintelligence and a stable prevalence of high intelligence levelsIntelligence 13 255minus262

Tuma J M amp Appelbaum A S (1980) Reliability and practiceeffects of WISC-R IQ estimates in a normal population Educa-tional and Psychological Measurement 40 671minus678

te Nijenhuis J Tolboom E Resing W amp Bleichrodt N (2004)Does cultural background influence the intellectual performance ofchildren from immigrant groups Validity of the RAKITintelligence test for immigrant children European Journal ofPsychological Assessment 20 10minus26

te Nijenhuis J amp van der Flier H (1997) Comparability of GATBscores for immigrants and majority group members Some Dutchfindings Journal of Applied Psychology 82 675minus687

te Nijenhuis J Voskuijl O F amp Schijve N B (2001) Practice andcoaching on IQ tests Quite a lot of g International Journal ofSelection and Assessment 9 302minus308

Thorndike R L (1985) The central role of general ability inprediction Multivariate Behavioral Research 20 241minus254

Tucker L R Damarin F amp Messick S (1966) A base-free measureof change Psychometrika 31(4) 457minus473

van der Doef M P Kwint J M amp van der Koppel (1989) Wat lerenmoeilijk lerende kinderen van de WISC-R [What do children whohave difficulties in learning learn from the WISC-R] Kind enAdolescent 10 136minus141

United States Department of Labor (1970) Manual for the USTESGeneral Aptitude Test Battery Section III DevelopmentWashing-ton DC United States Department of Labor

van Geffen (1972) De betrouwbaarheid van de GATB 1002-B opbrugklasniveau [The reliability of the GATB 1002 B for the firstclass at secondary school] Catholic University Nijmegen theNetherlands Psychology of Work and Organisation

van Haasen P P de Bruyn E E J Pijl Y J Poortinga Y H LutjeSpelberg H C Vander Steene G et al (1986) WISC-RWechsler Intelligence Scale for Children-Revised Nederlandsta-lige uitgave [WISC-R Wechsler Intelligence Scale for Children-Revised Dutch edition] Lisse the Netherlands Swets

Wechsler D (1955) Manual for the Wechsler Adult IntelligenceScale New York The Psychological Corporation

Wechsler D (1967)Manual for the Wechsler Preschool and PrimaryScale of Intelligence New York The Psychological Corporation

Wechsler D (1974) Manual for the Wechsler Intelligence Scale forChildren-Revised New York The Psychological Corporation

Wechsler D (1981) WAIS-R manual Wechsler Adult IntelligenceScale-Revised New York The Psychological Corporation

Wechsler D (1997)WAIS-III Wechsler Adult Intelligence Scale-thirdedition and WMS-III Wechsler Memory Scale-third editionTechnical manual New York The Psychological Corporation

Wicherts J W Dolan C V Oosterveld P van Baal G C VBoomsma D I amp Span M M (2004) Are intelligence testsmeasurement invariant over time Investigating the nature of theFlynn effect Intelligence 32(5) 509minus537

Yeld N amp Haeck W (1997) Educational histories and academicpotential Can tests deliver Assessment and Evaluation in HigherEducation 22 5minus16

Zaaiman H (1998) Selecting students for Mathematics and ScienceThe challenge facing higher education in South Africa SouthAfrica Pretoria HSRC Publishers

Zaaiman H van der Flier H amp Thijs G D (2001) Dynamic testingin selection for an educational programme Assessing SouthAfrican performance on the Raven Progressive Matrices Inter-national Journal of Selection and Assessment 9 258minus269

Zolezzi S A (1992) Alternative selection measures for universityundergraduate admissions Unpublished MEd dissertation Uni-versity of the Witwatersrand South Africa

Zolezzi S A (1995) The effectiveness of dynamic assessment as analternative aptitude testing strategy Unpublished DPhil disserta-tion University of South Africa South Africa

287J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

the principle that there are artifacts in every dataset andthat most of these artifacts can be corrected In thepresent study we corrected for five artifacts that alter thevalue of outcome measures listed by Hunter andSchmidt (1990) (1) sampling error (2) reliability ofthe vector of g loadings (3) reliability of the vector ofscore gains (4) restriction of range of g loadings and(5) deviation from perfect construct validity

81 Correction for sampling error

In many cases sampling error explains the majorityof the variation between studies so the first step in apsychometric meta-analysis is to correct the collectionof effect sizes for differences in sample size between thestudies

82 Correction for reliability of the vector of g loadings

The value of rgd is attenuated by the reliability of thevector of g loadings for a given battery When twosamples have a comparable N the average correlationbetween vectors is an estimate of the reliability of eachvector The collection of datasets in the present studyincluded no g vectors for the same battery from differentsamples and therefore artifact distributions were basedupon other studies reporting g vectors for two or moresamples So the effect sizes and the distribution ofreliabilities of the g vector were based upon differentsamples When two g vectors were compared thecorrelation between them was used and when morethan two g vectors were compared the averagecorrelation for the various combinations of two vectorswas used The combined N from the samples on whichthe g vector was based was taken as the weight of onedata point

Several samples were compared that differed little onbackground variables For the comparisons usingchildren we chose samples that were highly comparablewith regard to age and for the comparisons of adults wechose samples that were roughly comparable withregard to age In a study on young children Schrootsand van Alphen de Veer (1979) report correlationmatrices for the Leidse Diagnostische Test for eight agegroups between 4 and 8 years of age The averagecorrelation between the adjacent age groups is 75(combined N=1169) Several studies report data onboth younger and older children The DutchFlemishWISC-R (van Haasen et al 1986) has samples withcomparable N of Dutch and Flemish children so the 11age groups between 6 and 16 could be compared Thisresulted in an average correlation of 78 (combined

N=3018) Jensen (1985) reports g loadings of the 12subtests of the WISC-R obtained in three largeindependent representative samples of Black andWhite children The average correlation between the gvectors obtained for each sample is 86 for the Blackchildren (combined N=1238) and 93 for the Whitechildren (combined N=2868) In a study on olderchildren Evers and Lucassen (1991) report the correla-tion matrices of the Dutch DAT The average correlationbetween the g vectors of three educational groups is 88(combined N=3300) The US GATB manual (1970chapter 20) gives correlation matrices for large groupsof boys and girls in secondary school The averagecorrelation between the g vectors of the same-age boysand girls is 97 (combined N=26708) Several studiesreport data on adults g loadings of the eight subtests ofthe GATB are reported by te Nijenhuis and van der Flier(1997) for applicants at Dutch Railways and by de Wolffand Buiten (1963) for seamen at the Royal Dutch Navyresulting in a correlation of 90 (combined N=1306)The US GATB manual (1970) gives correlation matricesfor two large groups of adults which yields a correlationbetween g vectors of 94 (combined N=4519) JohnsonBouchard Krueger McGue and Gottesman (2004)report g loadings for a sample that took the WAIS andWechsler (1955) reports the correlation matrices of theWAIS for adults of comparable age so g loadings couldbe computed The correlation between the g vectors forthe two studies is 72 (combined N=736) So it appearsthat g vectors are quite reliable especially when thesamples are very large

The number of tests in the batteries in the presentstudy varied from 7 to 14 The number of tests does notnecessarily influence the size of rgd but clearly has aneffect upon its variability Because variability in thevalues of the artifacts influences the amount of varianceartifacts explain in observed effect sizes we estimatedthis variability using data from the samples described inthe previous paragraph

83 Correction for reliability of the vector of scoregains

The value of rgd is attenuated by the reliability ofthe vector of score gains for a given battery Whentwo samples have a comparable N the average cor-relation between vectors is an estimate of thereliability of each vector The reliability of the vectorof score gains was estimated using the presentdatasets comparing samples that took the same testand that differed little on background variables Forthe comparisons using children we choose samples

288 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

that were highly comparable with regard to age andfor the comparisons of adults we choose samples thatwere roughly comparable with regard to age

In the GATB manual (1970 ch 15) 13 combinationsof two studies are described where large samples of menand women that are comparable with respect to age andbackground took the same GATB subtests The averageunweighted correlation between the d vectors of menand women is 83 (total N=3760) In the GATB manual(1970 ch 20) three combinations of three studies aredescribed where very large samples of boys and girlsthat are in the same grade in secondary school took thesame GATB subtests This yielded correlations betweenthe d vectors of respectively 99 98 and 94 (totalN=20541) Together van Geffen (1972) and Bosch(1973) report three Dutch GATB testndashretest studies onchildren in secondary school resulting in threecomparisons between d vectors The average N-weighted correlation between the d vectors is 47 (totalN=127) Vectors of score gains from two differentdatasets on the WISC-R were compared Tuma andAppelbaum (1980) tested children with an average ageof 10 and Wechsler (1974) tested 10- and 11-year-oldsThe correlation between the two d vectors is 71 (totalN=147) Comparison of vectors of score gains fromdatasets on the DAT (Bennett Seashore amp Wesman1974) resulted in correlations of respectively 78 and73 so an average r of 76 (total N=254) So it appearsthat d vectors are quite reliable especially when thesamples are very large We estimated the reliabilities ofthe d vectors in the database using data from thesamples described in this paragraph

84 Correction for restriction of range of g loadings

The value of rgd is attenuated by the restriction ofrange of g loadings in many of the standard testbatteries The most highly g-loaded batteries tend tohave the smallest range of variation in the subtests gloadings Jensen (1998a pp 381ndash382) shows thatrestriction in g loadedness strongly attenuates thecorrelation between g loadings and standardized groupdifferences Hunter and Schmidt (1990 pp 47ndash49)state that the solution to range variation is to define areference population and express all correlations interms of that reference population The Hunter andSchmidt meta-analytical program computes what thecorrelation in a given population would be if thestandard deviation were the same as in the referencepopulation The standard deviations can be comparedby dividing the study population standard deviation bythe reference group population standard deviation that

is u=SDstudySDref As the reference we took thetests that are broadly regarded as exemplary for themeasurement of the intelligence domain namely thevarious versions of the Wechsler tests for childrenThe average standard deviation of g loadings of thevarious Dutch and US versions of the WISC-R andthe WISC-III was 0128 So the SD of g loadings ofall test batteries was compared to the average SD ing loadings in the Wechsler tests for children Thisresulted in some batteriesndashsuch as the GATBndashhavinga value of u larger than 100

85 Correction for deviation from perfect constructvalidity

The deviation from perfect construct validity in gattenuates the value of rgd In making up any collectionof cognitive tests we do not have a perfectly repre-sentative sample of the entire universe of all possiblecognitive tests So any one limited sample of tests willnot yield exactly the same g as any other limitedsample The sample values of g are affected by psy-chometric sampling error but the fact that g is verysubstantially correlated across different test batteriesimplies that the differing obtained values of g can allbe interpreted as estimates of a ldquotruerdquo g The value ofrgd is attenuated by psychometric sampling error ineach of the batteries from which a g factor has beenextracted

The more tests and the higher their g loadings thehigher the g saturation of the composite score TheWechsler tests have a large number of subtests withquite high g loadings resulting in a highly g-saturatedcomposite score Jensen (1998a pp 90ndash91) states thatthe g score of the Wechsler tests correlate more than 95with the tests IQ score However shorter batteries witha substantial number of tests with lower g loadings willlead to a composite with a somewhat lower g saturationJensen (1998a ch 10) states that the average g loadingof an IQ score as measured by various standard IQ testsis in the +80 s When we take this value as an indicationof the degree to which an IQ score is a reflection ofldquotruerdquo g we can estimate that a tests g score correlatesabout 85 with ldquotruerdquo g As g loadings are thecorrelations of tests with the g score it is most likelythat most empirical g loadings will underestimate ldquotruerdquog loadings so empirical g loadings correlate about 85with ldquotruerdquo g loadings As the Schmidt and Le computerprogram only includes corrections for the first fourartifacts the correction for deviation from perfectconstruct validity was carried out on the value of rgdafter correction for the first four artifacts To limit the

Table 1Dutch British and US studies of correlations between g loadings and gain scores

Reference Test r N Information

Drenth et al (1968) AKIT minus 57 100 Primary-school childrenvan Geffen (1972) GATB minus 45 42 Secondary-school children

minus 21 42Bosch (1973) GATB minus 07 43 Secondary-school childrenSchroots and van Alphen

de Veer (1979)LDT minus 42 96 Pre-school and secondary-school children

Bleichrodt et al (1987) RAKIT 09 49 Pre-school childrenminus 25 51 Primary-school childrenminus 21 49 Primary-school children

van der Doef et al (1989) WISC-R minus 69 22 Primary-school children with learning problemsMulder et al (2004) KAIT minus 23 46 Secondary-school children+young adults

minus 42 25 AdultsKort et al (2005) WISC-III minus 15 42 Primary-school children

minus 26 67 Primary-school childrenminus 46 39 Secondary-school children

Luteijn and Barelds (2005) GIT2 minus 51 44 AdultsKooij et al (2005) WAIS-III minus 63 60 AdultsElliott (1983) BAS minus 65 60 Primary-school childrenWechsler (1967) WPPSI minus 46 50 Pre-school childrenUnited States Department

of Labor (1970)GATB minus 35 156 Office applicants

minus 66 605 Male high school seniorsminus 70 554 Female high school seniorsminus 58 223 Males 1-day intervalminus 41 186 Females 1-day intervalminus 50 202 Males 2-week intervalminus 52 152 Females 2-week intervalminus 67 156 Males 6-week intervalminus 61 168 Females 6-week intervalminus 43 176 Males 13-week interval02 149 Females 13-week interval

minus 62 157 Males 26-week intervalminus 32 136 Females 26-week intervalminus 69 119 Males 1-year intervalminus 31 183 Females 1-year intervalminus 96 118 Males 2-year intervalminus 75 170 Females 2-year intervalminus 75 123 Males 3-year intervalminus 48 183 Females 3-year intervalminus 92 3398 Boys secondary schoolminus 92 3680 Girls secondary schoolminus 91 3348 Boys secondary schoolminus 91 3491 Girls secondary schoolminus 84 3229 Boys secondary schoolminus 87 3395 Girls secondary school

Wechsler (1974) WISC-R minus 48 97 Primary-school childrenminus 66 102 Primary-school childrenminus 21 104 Secondary-school children

Bennett et al (1974) DAT minus 79 92 Boys secondary schoolminus 53 81 Girls secondary schoolminus 29 81 Boys secondary schoolminus 62 100 Girls secondary school

Covin (1977) WISC-R minus 57 30 Primary-school children with learning problemsTuma and Appelbaum (1980) WISC-R minus 08 45 Primary- and secondary-school childrenMatarazzo et al (1980) WAIS minus 10 29 Young malesWechsler (1981) WAIS-R minus 64 71 Adults

minus 48 48 Adults

(continued on next page)

289J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Table 1 (continued)

Reference Test r N Information

McCormick et al (1983) ASVAB minus 73 57 adultsKaufman and Kaufman (1983) K-ABC minus 27 46 Pre-school children

minus 18 36 Pre- and primary-school childrenminus 22 70 Primary-school children

Wechsler (1997) WAIS-III minus 45 100 Young adultsminus 57 102 Adultsminus 51 104 Adults03 88 Adults

Reeve and Lam (2005) EAS minus 34 123 Undergraduate students

In general the g loadings were based on the correlation matrix taken from the manuals containing the testndashretest studies or from the correlation matrixbased on the largest sample size we could find What follows is a list of the sources of the g loading when not taken from the manuals containing thetestndashretest studyvan Geffen (1972) and Bosch (1973) de Wolff and Buiten (1963) see also Johnson te Nijenhuis and Bouchard (in press) Bleichrodt et al (1987) teNijenhuis et al (2004) who used the same data on which the RAKIT manual is based van der Doef Kwint and van der Koppel (1989) DutchWISC-R manual Elliott (1983) Table 98 Age 90ndash911 years US Dept of Labors GATB (1970) Jensen (1985 p 214) using the largestcorrelation matrix in the GATBmanual Wechsler (1974) Covin (1977) and Tuma and Appelbaum (1980) Jensen (1985 p 214 first study) Bennettet al (1974) average of four highly similar correlation matrices Matarazzo et al (1980) Wechslers (1955 p 17) Table 8 for ages 25ndash34McCormick et al (1983) Ree and Carretta (1994) Reeve and Lam (2005) utilize SEM analyses and use item parcels instead of full scale scores tocompute g loadings The average g loading of all the item parcels for a specific subtest was taken as the g loading of that specific subtest

290 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

risk of overcorrection we conservatively chose thevalue of 90 for the correction

9 Results

The results of the studies on the correlation between gloadings and gain scores are shown in Table 1 The tablegives data derived from 64 studies with participantsnumbering a total of 26990 The table gives thereference for the study the cognitive ability test usedthe correlation between g loadings and gain scores thesample size and background information on the study Itis clear that virtually all correlations are negative and thatthe size of the few positive correlations is very small

Table 2 shows the results of the psychometric meta-analysis of the 64 data points It shows (from left toright) the number of correlation coefficients (K) totalsample size (N) the mean observed correlations (r) andtheir standard deviation (SDr) the true correlations onecan expect once artifactual error from unreliability in theg vector and the d vector and range restriction in the gvector has been removed (ρ) and their standarddeviation (SDρ) The next two columns present thepercentage of variance explained by artifactual errors (

Table 2Meta-analysis results for correlations between g loadings and gain scores af

Studies included K N r SD

All 64 26990 minus 80 20All minus 3 outliers 61 26704 minus 81 18

K=number of correlations N=total sample size r=mean observed correlcorrelation ρ=true correlation (observed correlation corrected for unreliabiliVE=percentage of variance accounted for by artifactual errors 95 CI=9

VE) and the 95 credibility interval (95 CI) Thisinterval denotes the values one can expect for ρ in 19out of 20 cases

The large number of data points and the very largesample size indicate that we can have confidence in theoutcomes of this meta-analysis The estimated truecorrelation has a value of minus 95 and 81 of the variancein the observed correlations is explained by artifactualerrors However Hunter and Schmidt (1990) state thatextreme outliers should be left out of the analysesbecause they are most likely the result of errors in thedata They also argue that strong outliers artificiallyinflate the SD of effect sizes and thereby reduce theamount of variance that artifacts can explain We choseto leave out three outliersndashmore than 4 SD below theaverage r and more than 8 SD below ρndashcomprising1 of the research participants This resulted in nochanges in the value of the true correlation a largedecrease in the SD of ρ with 74 and a large increasein the amount of variance explained in the observedcorrelations by artifacts by 22 So when the threeoutliers are excluded artifacts explain virtually all of thevariance in the observed correlations Finally a correc-tion for deviation from perfect construct validity in g

ter corrections for reliability and restriction of range

r ρ SDρ VE 95 CI

minus 95 11 81 minus074 to 116minus 95 03 99 minus091 to 100

ation (sample size weighted) SDr=standard deviation of observedty and range restriction) SDρ=standard deviation of true correlation5 credibility interval

291J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

took place using a conservative value of 90 Thisresulted in a value of minus106 for the final estimated truecorrelation between g loadings and score gainsApplying several corrections in a meta-analysis maylead to correlations that are larger than 100 or minus100 asis the case here Percentages of variance accounted forby artifacts larger than 100 are also not uncommon inpsychometric meta-analysis They also do occur in othermethods of statistical estimation (see Hunter amp Schmidt1990 pp 411ndash414 for a discussion)

10 Discussion

A large-scale meta-analysis of 64 testndashretest studiesshows that after corrections for several artifacts there isan estimated true correlation of minus106 between gloading of tests and score gains and virtually all of thevariance in observed correlations is attributable to theseartifacts As several artifacts explain virtually all thevariance in the effect sizes other dimensions on whichthe studies differ such as age of the test takers testndashretest interval test used average-IQ samples or sampleswith learning problems play no role at all

The estimated true correlation of minus106 is the resultof various corrections for artifacts that attenuate thecorrelations The estimated values of the artifacts mayunderestimate or overestimate the population values ofthe artifacts Therefore estimates of true effect sizesmay overestimate or underestimate the populationvalues of the effect size As a solution to this problemHunter and Schmidt (2004) suggest carrying out severalmeta-analyses on the same construct and taking theaverage estimated effect size of all meta-analyses Thegeneral idea is that meta-analysis is a powerful researchtool but does not give perfect outcomes

A correlation of minus106 falls outside the range ofacceptable values of a correlation but one has to make adistinction between the meta-analytical estimate of thetrue correlation between g and d and the true correlationbetween g and d We interpret the value of minus106 for themeta-analytical estimate as meaning that the truecorrelation between g and d is minus100 A correlation ofminus100 means that there is an inverse relationshipbetween g and score gains So the tests with the highestg loadings show the smallest gains The most straightfor-ward interpretation of this very large negative correlationis that there is no g saturation in testndashretest gain scores

11 The South African learning potential study

In a carefully carried-out study Skuy et al (2002)used a dynamic testing procedure to see whether it

would improve the scores of Black South Africanstudents on Ravens Standard Progressive Matrices(RSPM) The Bantu Education Act of 1954 establisheda discriminatory educational system characterized bypoorly qualified teachers sparsely equipped and fundedschools and generally poor quality Most Black studentsin the sample had not received the same quality ofeducation as White students Black White Indian andColored research participants took the RSPM on twooccasions and in between randomly constitutedexperimental groups were exposed to the MediatedLearning Experience Both the Black South Africangroup and the group consisting of White Indian andColored South Africans improved over their baseline onthe RSPM and the Black group showed greaterimprovement

The value of these cognitive interventions increaseswhen the score gains are transferred to other tests andto external criteria such as school or work achieve-ment Therefore the research participants also tookFeuersteins Representational Stencil Design Test as atransfer measure The subject is presented with astencil of a geometric design and then asked to pointto which stencils need to be used and in whatsequence in order to construct an identical designLike the RSPM the Stencils test also requiresrepresentationalabstract thinking but the training onthe RSPM showed little transfer to it Moreover thecorrelation of the RSPM scores with performance inthe end-of-year psychology examination did notsignificantly improve after mediation Once againthe score gains were empty they did not generalizeSkuy et al go on to ask the question what it is thatwas improved by their interventions Professor Skuymade his data accessible to the present authors so wecould perform additional analyses

12 Sample

The data from Skuy et al (2002) were used with theexception of data from three research participantsbecause their pretest IQ scores were extremely low(more than 3 SDs below the group mean) Ninety-fiveuniversity students in psychology aged 16 to 29 (meanage=20 SD=23 25 males 70 females) participatedin this study They were 66 Black students (20 males 46females) and 29 White (20) Indian (6) and Colored (3)students (5 males 24 females) The mean age of theBlack group was 20 (SD=25) and of the WhiteIndian and Colored group 19 years (SD=1) Subjectswere randomly assigned to the experimental group(n=55) and to the control group (n=40)

292 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

13 Procedure

The students participated in pre- and posttest phaseswith a group intervention in between The study focusedon improvement in scores on the RSPM using the SetVariations II of the Learning Propensity AssessmentDevice as the mediation task Mediation training took3 h and was conducted by three experienced psychol-ogists with the assistance of six postgraduate psychol-ogy students A detailed description is given in Skuy etal (2002)

14 Measures and cognitive intervention

The Ravens Standard Progressive Matrices consistsof 60 items (divided into 5 sets of 12 items) designedto measure the ability to form comparisons to reasonby analogy and to organize spatial information intorelated wholes It has been established as one of thepurest measures of g (Jensen 1998a) Skuy et al(2002) found no evidence for test bias against Blacksin South African education Rushton Skuy and Bons(2004) showed that the Ravens gave comparablepredictive validities for students from various groupsCross-cultural testing research has clearly shown thatunsufficient proficiency in the language of the testcan lead to biased assessments in tests with a strongverbal component However the Ravens is a non-verbal test

The Learning Propensity Assessment Device con-sists of 14 exercises Each exercise contains an initialmediation task Subsequent tasks increase in complex-ity and novelty and aim to assist the learner toachieve mastery over the task The purpose ofmediation is to assist the learner to develop theappropriate cognitive strategies and functions neededfor the successful completion of the task The SetVariations II of the Learning Propensity AssessmentDevice consists of five sets of items which comprisevariations of Sets C D and E of the RSPM test Eachset of variations contains a learning task for thepurpose of initial mediation followed by a series ofprogressively more difficult variations to which theskills learned must be applied Mediation involvesdiscussing with groups how to define the problem tobe solved focus on the task set rules regulateproblem solving behavior and identify the correctsequence of logical steps needed to solve the taskMediation also involves helping the subject todevelop appropriate concepts verbal tools andinsights in relation to the task A detailed descriptionis given in Skuy et al (2002)

15 Statistical analyses

Although the Skuy et al study is among the SouthAfrican learning potential studies with the largestsample size the N is not large We therefore chosebasic statistical analyses

151 Descriptive statistics

Means standard deviations and reliabilities werecomputed for the various groups With regard tomeasures of effect size Hunter and Schmidt (1990 p271) advise choosing estimates of variance with the leasterror Because repeated test takings tend to change thesize of the SD (Ackerman 1987) we chose the SD ofthe pretests for the denominator The correlationbetween scores before and after the training wascomputed to see whether the training had an effect onthe rank order of individuals scores

152 Correlation between score gains and g loadedness

Because our sample was not large and quite specificestimates of g loadedness were taken from Lynn Allikand Irwings (2004) item analysis of RSPM in Estoniausing a large (N=2735) nationally representativesample The same reasoning as in psychometric meta-analysis applies namely that larger samples give betterestimates of g loadings than smaller samples In ahierarchical factor analysis of the items using structuralequations modeling Lynn et al computed g loadings of52 of the 60 items In the present study Pearsoncorrelations were calculated between the g loadings ofthese 52 items and the effect sizes on these items

153 g loadings

The RSPM consists of dichotomous items so wecomputed a correlation matrix of polychoric correlations(Nunnally amp Bernstein 1994) A principal axis factoranalysis was carried out The percentage varianceexplained by the first unrotated factor was taken as anestimate of g loadedness Because sample size waslimited we collapsed the experimental and the controlgroup

154 Correlation between sum scores and score gains

We tested whether individuals with low-g improvedtheir scores more than those with high-g by correlatinggain scores with pretest RSPM scores for each of thefour research groups As gain scores tend to be

Table 3Proportion of sample selecting the correct answer on items of Ravens Standard Progressive Matrices by group

Set A Set B Set C Set D Set E

Item Black Other a Item Black Other Item Black Other Item Black Other Item Black Other

1 100 100 13 100 100 25 100 97 37 100 100 49 74 902 97 100 14 100 100 26 96 100 38 99 100 50 64 903 97 100 15 100 100 27 96 100 39 89 100 51 79 974 100 97 16 91 97 28 86 93 40 92 100 52 56 835 100 100 17 96 97 29 94 97 41 96 100 53 52 836 99 100 18 85 100 30 76 83 42 92 100 54 35 767 94 97 19 77 66 31 88 97 43 77 100 55 42 798 91 93 20 79 97 32 50 79 44 76 93 56 21 699 100 97 21 83 97 33 74 90 45 71 97 57 30 4110 91 97 22 92 100 34 61 79 46 79 93 58 12 4111 83 90 23 80 90 35 53 69 47 29 41 59 02 1712 68 83 24 59 83 36 06 35 48 26 38 60 11 21a Other=White Indian and Colored

293J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

negatively correlated with pretest scores as a function ofunreliability (see Cronbach 1990 Nunnally amp Bern-stein 1994) we corrected the correlations using TuckerDamarin and Messicks (1966) formula 63 Using theformula one adds to each correlation the term (SDpretestSD gain score) (1minus reliability pretest)

16 Results

161 Descriptive statistics

Internal consistencies (Cronbach αs) on the RSPMranged from 76 to 86 for the pre- and posttestsrespectively Table 3 shows the proportion of each of thegroups which selected the correct answer on each of the60 items of the pretest Across the 60 items the order ofthe p values was almost identical for Blacks and WhiteIndianColoreds (r=92 p=00)

Table 4 shows the means and standard deviations forthe total RSPM scores for the four groups along withthe d effect sizes representing the difference betweenpre- and posttest scores (Cohen 1988) First we

Table 4Pre- and posttest mean ravens scores standard deviations and mean effect

Black experimental(n=40)

Black control (n=26)

Pretest Posttest Pretest Postte

Raw scoresM 4378 5010 4546 4835SD 664 531 669 671Percentile 14 41 16 31Effect size 095 043

Percentiles are based on US adult norms see Raven Raven and Courts (2a Other=White Indian and Colored

examined whether there was an effect of race (Blackvs WhiteIndianColored) and group (experimental vscontrol) on the pretest scores There was a significanteffect due to race (F(1 91)=2413 p=00 η2 = 21)but not group (F(1 91)=228 p= 14 η2 = 02) Thismeans that mean pretest scores of Blacks (M=4444 SD=665) were lower than those of WhiteIndianColoreds (M=5141 SD=505) and that mean pretestscores of experimental and control groups werecomparable (M=4553 SD=704 and M=48 SD=67 respectively)

Secondly we investigated the effects of training onthe posttest scores by performing a two-way ANCOVAon the total posttest scores with race and group as factorsand the total pretest scores as the covariate There was asignificant effect for group (F(1 95)=1381 p=00η2 = 13) and for race (F(1 90)=399 p=05 η2 = 04)but not for the two-way interaction of group and race (F(1 90)=028 p= 60 η2 = 00) These results indicatethat the training was equally effective for both the Blackand WhiteIndianColored students Posttest scores ofBlacks (M=4941 SD=591) however remained

sizes for Black and WhiteIndianColored students

Other a experimental(n=15)

Other control (n=14)

st Pretest Posttest Pretest Posttest

5020 5580 5271 5536605 376 345 34341 75 55 68093 077

000) Table SPM13

294 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

significantly lower (F(1 91)=2833 p=00) than thoseof WhitesIndiansColoreds (M=5559 SD=355)Although posttest scores of the experimental group(M=5165 SD=553) were higher than those of thecontrol group (M=508 SD = 665) differencesbetween both groups were nonsignificant (F(1 91)=085 p=36)

The correlation between scores before and after thetraining was 84 (p=00) for the experimental group and90 (p=00) for the control group showing that thetraining had only limited effect on the rank order ofindividuals scores This means that the test strongly butnot perfectly measures the same constructs on bothoccasions

162 Correlation between score gains and g loadedness

We estimated effect sizes for each of the four groups(race by condition) by computing the differencebetween mean pretest scores and posttest scores dividedby the standard deviation of the pretest scores of Blackand WhiteIndianColored students respectivelyFinally we calculated the correlations between effectsizes and the g loadings taken from Lynn et alCorrelations were minus 24 (p=10) for the Black experi-mental group minus 21 (p=20) for the WhiteIndianColored experimental group minus 08 (p=59) for theBlack control group and minus 41 (p=01) for the WhiteIndianColored control group Small sample sizesusually attenuate correlations (Hunter amp Schmidt1990) Collapsing the groups indeed resulted in higheraverage correlations minus 39 for the complete experimen-tal group and minus 26 for the complete control group

163 g loadings

Using the combined experimental and controlgroup a principle axis factor analysis on the pretestand posttest scores respectively resulted in a firstunrotated factor explaining 22 of the variance in thepretest scores and 18 of the variance in the posttestscores These findings suggest that the g loadedness ofthe RSPM decreased substantially after MediatedLearning Experience

164 Correlation between score gains and sum score

Correlating score gains with RSPM total scoresresulted in values of minus 60 (p=00) for the Blackexperimental group minus 18 (p=38) for the Black controlgroup minus 82 (p= 00) for the WhiteIndianColoredexperimental group and minus 48 (p=08) for the White

IndianColored control group After the use of thecorrection formula of Tucker et al (1966) thesecorrelations became minus 39 minus 08 minus 61 and minus 35respectively Overall these correlations show that low-g persons improved their scores more strongly thanhigh-g persons

17 Discussion

Skuy et al (2002) hypothesized that the low-qualityeducation of Blacks in South Africa would lead to anunderestimate of their cognitive abilities by IQ testsGroups of Black and WhiteIndianColored studentstook the Ravens Progressive Matrices twice and inbetween received Feuersteins Mediated LearningExperience The test scores went up substantially in allgroups Evidence for an authentic change in the g factorrequires broad transfer or generalizability across a widevariety of cognitive performance However Skuy et alshow that the gains did not generalize to scores on another highly similar test and to external criteria andwere therefore hollow As the score gains were in somecases quite largendash14 IQ points for the Black experi-mental groupndashthe question becomes what is it thatimproved

The findings show that the correlations betweenscore gains and g loadedness of the items were minus 39 forthe complete experimental group and minus 26 for thecomplete control group However because the gloadings and gain scores are measured at the itemlevel their reliabilities are not high resulting insubstantial attenuation of the correlation between gand d Moreover RSPM does not measure g perfectlyJensen (1998a p 91) estimates its g loading at 83When we estimate the reliability of the g vector at 70and the reliability of the gain score vector at 50corrections for unreliability and deviation from perfectconstruct validity of g only would result in estimatedtrue correlations of respectively minus 80 and minus 53 Thesevalues should be taken as underestimates controllingfor additional artifacts will bring them closer to the verystrong negative correlation found in the meta-analysis

The findings suggest that after training the gloadedness of the test decreased substantially Wefound negative substantial correlations between gainscores and RSPM total scores Table 4 shows that thetotal score variance decreased after training which is inline with low-g subjects increasing more than high-gsubjects Since as a rule high-g individuals profit themost from trainingndashas is reflected in the ubiquitouspositive correlation between IQ scores and trainingperformance (Jensen 1980 Schmidt amp Hunter 1998)ndash

295J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

these findings could be interpreted as an indication thatFeuersteins Mediated Learning Experience is not g-loaded in contrast with regular trainings that are clearlyg-loaded Substantial negative correlations betweengain scores and RSPM total scores are no definite proofof this hypothesis but are in line with it Additionalsubstantiation of our hypothesis that the Feuersteintraining has no or little g loadedness is that Coyle (2006)showed that gain scores loaded virtually zero on the gfactor Moreover Skuy et al reported that the predictivevalidity of their measure did not increase when thesecond Raven score was used The fact that individualswith low-g gained more than those with high-g could beinterpreted as an indication that the Mediated LearningExperience was not g-loaded It should be notedhowever that Feuerstein most likely did not intend hisintervention to be g-loaded He was interested inincreasing the performance of low scorers on bothtests and external criteria

18 General discussion

IQ scores are by far the best general predictor ofsuccess in education job training and work Howeverthere are many ways in which these IQ scores can beincreased for instance by means of retesting orparticipating in a learning potential training programWhat conclusions can be drawn from such score gainsJensens (1998a) hypothesis that the effects of trainingon abilities can be summarized in terms of Carrollsthree-stratum hierarchical factor model was tested in ameta-analysis on testndashretest data using Dutch Britishand American test batteries and with learning potentialdata from South Africa using Ravens ProgressiveMatrices The meta-analysis convincingly shows thattestndashretest score gains are not g-loaded The findingsfrom the learning potential study are clearly in line withthis when the attenuation caused by unreliability andother artifacts is taken into account the correlationbetween g loadings of items and gains on items has avalue that is somewhat comparable to the one found inthe meta-analysis for test batteries The data suggest thatthe g loadedness of item scores decreases after theintervention training Te Nijenhuis et als (2001)finding that practice and coaching reduced the g-loadedness of their test scores strengthens the presentfindings using item scores The findings show that notthe high-g participants increase their scores the mostndashasis common in training situationsndashbut it is the low-gpersons showing the largest increases of their scoresThis suggests that the intervention training is not g-loaded

Our findings fit quite well with the hierarchical modelof intelligence The generalizability of test scores residespredominantly in the g component whereas the test-specific ability component and the narrow abilitycomponent are virtually non-generalizable This is forinstance evidenced by the earlier finding that addingverbal tests to a g score or numerical tests to a g scoreresulted in only a very small incremental validity (Ree ampEarles 1991 Ree et al 1994) Additionally Ericssonand Lehmann (1996) reported immense gains for amemory task focusing on one narrow ability but did notfind any improvement for comparable memory tasksfocusing on another narrow ability As the score gains arenot related to g the generalizable g componentdecreases and since it is not unlikely that the Feuersteintraining itself is not g-loaded it is easy to understand whythe score gains did not generalize to scores on thecognitively loaded Representational Stencil Design TestFor a similar reason the score gains did not generalize tog-loaded external criteria as the correlation of the RSPMscores with performance in the end-of-year psychologyexamination did not significantly improve after media-tion Reeve and Lam (2005) claimed that retesting doesnot change the nature of what is being tested but ourfindings suggest the opposite

19 Limitations of the studies

Our meta-analysis and our analysis of the SouthAfrican study are strongly based on the method ofcorrelated vectors (MCV) and recently it has been shownto have limitations Dolan and Lubke (2001) have shownthat when comparing groups substantial positive vectorcorrelations can still be obtained even when groups differnot only on g but also on factors uncorrelated with gAshton and Lee (2005) show that associations of avariable with non-g sources of variance can produce avector correlation of zero even when the variable isstrongly associated with g They suggest that the gloadings of a subtest are sensitive to the nature of the othersubtest in a battery so that a specific sample of subtestsmay cause a spurious correlation between the vectorsNotwithstanding these limitations studies using MCVcontinue to appear (see for instance Colom Haier ampJung in pressHartmannKruuseampNyborg in press Leeet al 2006) The outcomes of our meta-analysis of a largenumber of studies using the method of correlated vectorsmay make an interesting contribution to the discussion onthe limitations of the method of correlated vectors

A principle of meta-analysis is that the amount ofinformation contained in one individual study is quitemodest Therefore one should carry out an analysis of

296 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

all studies on one topic and correct for artifacts leadingto a strong increase of the amount of information Thefact that our meta-analytical value of r=minus106 isvirtually identical to the theoretically expected correla-tion between g and d of minus100 holds some promise thata psychometric meta-analysis of studies using MCV is apowerful way of reducing some of the limitations ofMCV An alternative methodological approach is tolimit oneself to the rare datasets enabling the use ofstructural equations modeling However from a meta-analytical point of view these studies yield only a quitemodest amount of information

Additional meta-analyses of studies employing MCVare necessary to establish the validity of the combinationof MCV and psychometric meta-analysis Most likelymany would agree that a high positive meta-analyticalcorrelation between measures of g and measures ofanother construct implies that g plays a major role andthat a meta-analytical correlation of minus100 implies that gplays no role However it is not clear what value of themeta-analytical correlation to expect from MCV when gplays only a modest role After the present meta-analysison a construct that clearly has an inverse relationshipwith g it would be informative to carry out meta-analyses of studies on variables that are strongly linkedto g and variables that are modestly linked to g Anexample of the latter would be secular score gainswhich according to Lynns (1990) nutrition theoryshould be modestly g-loaded

The sample sizes in the South African study are notlarge but still larger than those in many other studies oflearning potential where an Nasymp10 is not unusual Theresults of a reanalysis of the many existing studies ondynamic testing could lead to a meta-analysis with alarge combined N The mean posttest score was quitehigh so a ceiling effect may have taken place for theWhiteIndianColored group leading to an underestima-tion of the experimental score gain for this group

Instead of testing the hypothesis with a stronglyunidimensional test such as the RSPM it would be betterto use a multidimensional test Moreover a large samplesize would allow the use of more rigorous data-analyticaltechniques leading to more definitive results Howeverto the best of our knowledge datasets meeting theserequirements do not exist and the Skuy et al study isarguably the best South African learning potential study

20 Score gains as low-quality measuresof motivation

As criterion-related validity is strongly dependent ong te Nijenhuis et als finding of lowered g loadings

after training should result in lowered criterion-relatedvalidity However the empirical findings show theopposite virtually all testndashretest and test preparationstudies on cognitive tests and scholastic aptitude tests thatreported both criterion-related validities demonstratesmall to modest increases in criterion-related validity forthe second or third test score (see Allalouf amp Ben-Shakhar 1998 Bashi 1976 Coyle 2006 HausknechtTrevor amp Farr 2002 Jones 1986 Linn 1977 Olsen ampSchrader 1959 Ortar 1960 Powers 1985 Reeve ampLam 2005) In the carefully designed study by Allaloufand Ben-Shakhar (1998) of a university entrance test theexperimental group received an intensive 40-h testcoaching program while the control group did not Thecriterion-related validity for the retest increased for bothgroups Most importantly the increase was the samemdashitwas not larger for the experimental group

In a little-known but carefully designed large-scalelearning potential study by Resing (1990 see Table423) she compared an experimental group thatreceived a pretest a learning potential training and aposttest against a control group that received only thepretest and the posttest The mean criterion-relatedvalidity of the various second scores was 62 for both theexperimental and the control group Learning potentialtraining did not result in incremental criterion-relatedvalidity over and above the validity resulting fromsimply retesting The findings from both Resing andAllalouf and Ben-Shakhar suggest that cognitiveinterventions do not increase criterion-related validitymore than simple retesting

g and the personality measure conscientiousness havebeen shown to make an excellent combination ofpredictors (Schmidt amp Hunter 1998) Conscientiousnessrepresents among other characteristics persistence a willto achieve and the ability to focus effort on the goal Afield study on test preparation using actual job applicants(Clause Delbridge Schmitt Chan amp Jennings 2001)showed that motivation to perform well on the testcorrelated 25 with test performance One could speculatethat score increases do not reflect a true cognitivecomponent but rather become low-quality measures ofmotivation Further since the increase in validity due toretesting and learning potential training is modest incomparison to the large increase obtainable from the useof personality questionnaires personality testing mightprovide a less expensive and more accurate alternative

21 Effectiveness of various training formats

Components of the mediation training used by Skuyet al (2002) are similar to the test training used in te

297J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Nijenhuis et al (2001) Both the Dutch training and theSouth African training took 3 h but whereas in theDutch training the focus was on two different testformats the South African training dealt only with onetest format The test training by Lloyd and Pidgeon(1961) took even less time namely two half-hoursegments each focusing on one test format The effectsizes in all studies were roughly comparable Thissuggests that the methodologies employed by teNijenhuis et al and Lloyd and Pidgeon were moreefficient than those used by Skuy et al It is possible thatthe components of the mediation training that are notpresent in the other two training formats are not effectivein raising test scores and could therefore be left out Iftrue it might be possible to increase the scores on theRSPM by one SD with a relatively simple 1-h training

22 Generalizability of findings

Can these findings of hollow score gains after testndashretest test practice and Mediated Learning ExperienceTraining be generalized to other studies where training-induced score gains were found Ericsson and Lehmann(1996) reported tremendous score increases afterintensive training on numeric memory tests but thesegains did not generalize in the least to verbal memorytests Such gains on one narrow ability do not generalizeto another narrow ability clustering under the samebroad ability and are therefore hollow Similarly Jensen(1998b) showed that score gains due to adoption werenot on the g factor and were therefore most likelyhollow

Rushton (1999) argued that intergenerational scoregains are not linked to g suggesting the Flynn effectsmay be empty but he was strongly criticized by Flynn(1999 2000) In studies on the Flynn effect score gainsfound in cross-sectional studies are largest on the RSPM(Flynn 1987) It has been suggested by Lynn (1998) thata substantial part of these intergenerational score gainson the RSPM are generalizablendashthey do reflect highergndashbut the remaining part is hollow and should beinterpreted as schooling effects The RSPM does requirethe application of the mathematical principles ofaddition subtraction progression and the distributionof values In the three decades (1950sndash1980s) overwhich these increases in RSPM scores have occurredincreasing proportions of 15- to 18-year-olds haveremained in schools where they have learned mathskills that they have applied to the solution of matricesproblems Our findings could be interpreted as supportfor Lynns hypothesis of the partial hollowness of scoregains on the RSPM Notwithstanding the high g loading

of the sum score of the RSPM it is quite sensitive totestndashretest effects and training effects Some studies onthe Flynn effect (Lynn amp Hampson 1986 Teasdale ampOwen 1989) show that the increase in scores is largelyconcentrated in the lower segments of the IQ distribu-tion Our finding that low scorers show the largest gainsafter training may additionally support the notion that apart of the Flynn effect on the RSPM is hollow FinallyWicherts et als (2004) findings show that in some oftheir datasets the secular score gains are most stronglylinked to broad- narrow- and test-specific abilitiesshowing that an important part of the gains are non-generalizable

Ceci (1991) showed that increased schooling leads tohigher IQ scores but are these gains highly specific orpredominantly generalizable It would be interesting toapply the techniques we used in this study to thefindings from previous intervention studies It may bethat biological interventions (such as diet vitaminsupplements vaccination against infectious disease)rather than psychological or educational interventionsare the most cost-effective method of producing truechanges in g and broad abilities It may be that there is abiological barrier between the first stratum and thesecond stratum that restricts the effects of behavioralinterventions to narrow abilities and test specificities

Acknowledgement

We like to thank Mervyn Skuy for his permission touse his dataset

Thanks to Marieacute de Beer Raegan Murphy WelkoTomic Art Jensen and Frank Schmidt for feedback onprevious versions of this paper

Thanks to Arne Evers Wilma Resing (Dutch TestCommittee) and Andress Kooij (Harcourt) for alsohelping in locating testndashretest studies

References

Ackerman P L (1986) Individual differences in informationprocessing An investigation of intellectual abilities Intelligence10 101minus139

Ackerman P L (1987) Individual differences in skill learning Anintegration of psychometric and information processing skillsPsychological Bulletin 102 3minus27

Allalouf A amp Ben-Shakhar G (1998) The effect of coaching on thepredictive validity of scholastic aptitude tests Journal ofEducational Measurement 35(1) 31minus47

Ashton M C amp Lee K (2005) Problems with the method ofcorrelated vectors Intelligence 33 431minus444

Bashi Y (1976) Verbal and non-verbal abilities of 4th 6th and 8thgrade students in the Arab educational system in Israel JerusalemHebrew University School of Education

298 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Bleichrodt N Resing W C M Drenth P J D amp Zaal J N (1987)Intelligentie-meting bij kinderen Empirische en methodologischeverantwoording van de geReviseerde Amsterdamse Kinder Intelli-gentie Test [Measuring the intelligence of children Empirical andmethodological justification of the Revised Amsterdam ChildrenIntelligence Test] Lisse the Netherlands Swets

Bennett G K Seashore H G ampWesman A G (1974)DifferentialAptitude Tests (5th ed) Manual New York The PsychologicalCorporation

Boeyens J C A (1989) Learning potential An empiricalinvestigation Pretoria South Africa Human Science ResearchCouncil

Bosch F (1973) Inventarisatie beschrijving en onderzoek mbt dewijzigingen van de GATB incl test-hertest onderzoek (NoPz3bRp0120) [Stock-taking description and research concern-ing the modifications of the GATB includes testndashretest study]Utrecht the Netherlands Nederlandse Spoorwegen

Carroll J B (1993) Human cognitive abilities A survey of factoranalysis studies Cambridge University Press

Ceci S J (1991) How much does schooling influence generalintelligence and its cognitive components A reassessment of theevidence Developmental Psychology 27 703minus722

Christian K Bachnan H J amp Morrison F J (2001) Schooling andcognitive development In R J Sternberg amp E L Grigorenko(Eds) Environmental effects on cognitive abilities (pp 287minus335)Mahwah NJ Erlbaum

Clause C S Delbridge K Schmitt N Chan D amp Jennings D(2001) Test preparation activities and employment test perfor-mance Human Performance 14 149minus167

Cohen J (1988) Statistical power analysis for the behavioralsciences Hillsdale Lawrence Erlbaum

Colom R Jung R E amp Haier R J (in press) Finding the g-factor inbrain structure using the method of correlated vectors Intelligence

Covin T A (1977) Stability of the WISC-R for 9-year-olds withlearning difficulties Psychological Reports 40 1297minus1298

Coyle T R (2006) Testndashretest changes on scholastic aptitude tests arenot related to g Intelligence 34 15minus27

Cronbach L J (1990) Essentials of psychological testing New YorkHarperCollins

de Villiers AB (1999) Disadvantaged students academic perfor-mance Analysing the zone proximal developmentUnpublished DPhil thesis University of Cape Town South Africa

de Wolff C J amp Buiten B (1963) Een factoranalyse van viertestbatterijen [A factor analysis of four test batteries] NederlandsTijdschrift Voor Psychologie 18 220minus239

Dolan C V amp Lubke G (2001) Viewing Spearmans hypothesisfrom the perspective of multigroup PCA A comment onSchonemanns criticism Intelligence 29 231minus245

Drenth P J D Petrie J F amp Bleichrodt N (1968) Handleiding bijde Amsterdamse Kinder Intelligentie Test [Manual of theAmsterdam Children Intelligence Test] Amsterdam VrijeUniversiteit

Elliott C D (1983) British Ability Scales Manual 2 TechnicalHandbook Windsor Great-Britain NFER-Nelson

Engelbrecht M (1999) Leerpotensiaal as voorspeller van akademi-ese sukses van universiteitsstudente [Learning potential aspredictor of the academic success of university students]Unpublished D Phil thesis Potchefstroom University forChristian Higher Education South Africa

Ericsson K A amp Lehmann A C (1996) Expert and exceptionalperformance Evidence of maximal adaptation to task constraintsAnnual Review of Psychology 47 273minus305

Evers A amp Lucassen W (1991) Handleiding DAT 83 DifferentieumlleAanleg Testserie [Manual DAT83 Differential Aptitude Testseries] Amsterdam Swets

Fleishman E A amp Hempel W E (1955) The relation betweenabilities and improvement with practice in a visual discriminationreaction task Journal of Experimental Psychology 49 301minus312

Flynn J R (1987) Massive IQ gains in 14 nations What IQ testsreally measure Psychological Bulletin 101 171minus191

Flynn J R (1999) Evidence against Rushton The genetic loading ofWISC-R subtests and the causes of between-group IQ differencesPersonality and Individual Differences 26 373minus379

Flynn J R (2000) IQ gains WISC subtests and fluid g g theory andthe relevance of Spearmans hypothesis to race In G R B JGoode (Ed) The nature of intelligence (pp 202minus227) New YorkWiley

Gaydon VP (1988) Predictors of performance of disadvantagedadolescents on the SowetoAlexandra gifted child programmeUnpublished M Ed dissertation University of the WitwatersrandSouth Africa

Gottfredson L S (1997) Why g matters The complexity of everydaylife Intelligence 24(1) 79minus132

Gottfredson L S (2002) g Highly general and highly practical In RJ Sternberg amp E L Grigorenko (Eds) The general intelligencefactor How general is it (pp 331minus380) Mahwah NJ Erlbaum

Grigorenko E L amp Sternberg R J (1998) Dynamic testing Psy-chological Bulletin 124 75minus111

HaeckW Yeld N Conradie J Robertson N amp Shall A (1997) Adevelopmental approach to mathematics testing for universityadmissions and course placement Educational Studies in Mathe-matics 33 71minus91

Hartmann P Kruuse NHS amp Nyborg H (in press) Testing thecross-racial generality of Spearmans hypothesis in two samplesIntelligence

Hausknecht J P Trevor C O amp Farr J L (2002) Retaking abilitytests in a selection setting Implications for practice effects trainingperformance and turnover Journal of Applied Psychology 87(2)243minus254

Hunter J E amp Schmidt F L (1990) Methods of meta-analysisLondon Sage

Hunter J E amp Schmidt F L (2004) Methods of meta-analysis (2nded) London Sage

Jensen A R (1980) Bias in mental testing London MethuenJensen A R (1985) The nature of the blackndashwhite difference on

various psychometric tests Spearmans hypothesis Behavioraland Brain Sciences 8 193minus263

Jensen A R (1998a) The g factor The science of mental abilityLondon Praeger

Jensen A R (1998b) Adoption data and two g-related hypothesesIntelligence 25 1minus6

Johnson W Bouchard T J Krueger R F Jr McGue M ampGottesman I I (2004) Just one g Consistent results from threetest batteries Intelligence 32 95minus107

Johnson W te Nijenhuis J amp Bouchard TJ Jr (in press)Replication of the hierarchical visual-perceptual-image rotationmodel in de Wolff and Buitens (1963) battery of 46 tests of mentalability Intelligence

Jones R J (1986) A comparison of the predictive validity of theMCAT for coached and uncoached students Journal of MedicalEducation 61 335minus338

Kaufman A S amp Kaufman N L (1983) K-ABC KaufmanAssessment Battery for Children Interpretive manual CirclePines MN AGS

299J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Kooij A P Rolfhus E Wilkins C Yang Z amp Zhu J (2005)WAIS-III Nederlandstalige bewerking Technisch rapport hernor-mering [WAIS-III adoptation in Dutch Technical report renorm-ing] Amsterdam Harcourt

Kort W Schittekatte M Dekker P H Verhaeghe P Compaan EL Bosmans M amp Vermeir G (2005) WISC-IIINL WechslerIntelligence Scale for Children Derde Editie NL Handleiding enverantwoording [The Dutch WISC-III Wechsler Intelligence Scalefor Children Third Edition for the Netherlands Manual andjustification] Amsterdam NIP

Kulik J A Bangert-Drowns R L amp Kulik C C (1984)Effectiveness of coaching for aptitude tests PsychologicalBulletin 95 179minus188

Kulik J A Kulik C C amp Bangert R L (1984) Effects of practiceon aptitude and achievement test scores American EducationalResearch Journal 21 435minus447

Lee K H Choi Y Y Gray J R Cho S H Chae J -H Lee S etal (2006) Neural correlates of superior intelligence Strongerrecruitment of posterior parietal cortex NeuroImage 29(2)578minus586

Linn R L (1977) On the treatment of multiple scores for LawSchool Admission Test repeaters (Report LSAC-77-4) In LawSchool Admission Council Reports of LSAC Sponsored ResearchVolume III 1975-1977 Princeton NJ Law School AdmissionCouncil

Lipson LE (1992) Relationship of static and dynamic measures toscholastic achievement of black pupils Unpublished MEddissertation University of Witwatersrand South Africa

Lloyd F amp Pidgeon D A (1961) An investigation into the effects ofcoaching on non-verbal test material with European Indian andAfrican children British Journal of Educational Psychology 31145minus151

Luteijn F amp Barelds D P H (2005) GIT2 Groninger IntelligentieTest 2 [GIT2 Groningen Intelligence Test 2] Amsterdam Harcourt

Lynn R (1990) The role of nutrition in secular increases inintelligence Personality and Individual Differences 11 273minus285

Lynn R (1998) In support of the nutrition theory In U Neisser(Ed) The rising curve Long-term gains in IQ and relatedmeasures (pp 207minus215) Washington DC American Psycholo-gical Association

Lynn R Allik J amp Irwing P (2004) Sex differences on three factorsidentified in Ravens Standard Progressive Matrices Intelligence32 411minus424

Lynn R amp Hampson S (1986) The rise of national intelligenceEvidence from Britain Japan and the USA Personality andIndividual Differences 7 23minus32

Matarazzo J D Carmody T P amp Jacobs L D (1980) Testndashretestreliability and stability of the WAIS A literature review withimplications for clinical practice Journal of Clinical Neuropsy-chology 2(2) 89minus105

McCormick BK Dunlap WP Kennedy RS amp Jones MB(1983) The effects of practice on the Armed Forces VocationalAptitude Test Battery US Army Research Institute for theBehavioral and Social Sciences Technical Report 602

Mulder J L Dekker R amp Dekker P H (2004) KaufmanIntelligentietest voor adolesecenten en volwassenen (KAIT)Handleiding [Kaufman Intelligence test for adolescents and adults(KAIT) Manual] Leiden the Netherlands PITS

Murphy R (2002) A review of South African research in the fieldof dynamic assessment Unpublished MA dissertation Universityof Pretoria (available online from httpupetdupaczathesisavailableetd-05042002-161239)

Nel A (1997)Die voorspelling van akademiese sukses binne kontekstvan n alternatiewe universiteitstoelatingsbeleid [The predictionof academic success within the context of an alternative policy ofuniversity admission] Unpublished MA dissertation RandAfrikaans University South Africa

Neubauer A C amp Freudenthaler H H (1994) Reaction time in asentence-picture verification test and intelligence Individualstrategies and effects of extended practice Intelligence 19193minus218

Nunnally J C amp Bernstein I H (1994) Psychometric theory(3rd ed) New York McGraw-Hill

Olsen M amp Schrader W B (1959) The use of preliminary and finalScholastic Aptitude Test scores in predicting college grades(College Entrance Examination Board Research and DevelopmentReports and Statistical Reports 59-19 Princeton NJ Educa-tional Testing Service

Ortar G R (1960) Improving test validity by coaching EducationalResearch 2 137minus142

Powers D E (1985) Effects of test preparation on the validity ofGraduate Admission Test Applied Psychological Measurement 9179minus190

Raven J Raven J C amp Court J H (2000) Standard ProgressiveMatrices Raven manual Section 3 Oxford Psychologists Press

Ree M J amp Carretta T R (1994) The correlation of generalcognitive ability and psychomotor tracking tests InternationalJournal of Selection and Assessment 2 209minus216

Ree M J amp Earles A A (1991) Predicting training success Notmuch more than g Personnel Psychology 44 321minus332

Ree M J Earles J A amp Teachout M S (1994) Predicting jobperformance Not much more than g Journal of AppliedPsychology 79 518minus524

Reeve C L amp Lam H (2005) The psychometric paradox of practiceeffects due to retesting Measurement invariance and stable abilityestimates in the face of observed score changes Intelligence 33535minus549

Resing W C M (1990) Intelligentie en leerpotentieel Eenonderzoek naar het leerpotentieel van jonge leerlingen uit hetbasis-en speciaal onderwijs [Intelligence and learning potential Astudy into the learning potential of young students in basic andspecial education] Amsterdam the Netherlands Swets

Rushton J P (1999) Secular gains in IQ are not related to the g factorand inbreeding depressionmdashunlike blackndashwhite differences A replyto Flynn Personality and Individual Differences 26 381minus389

Rushton J P Skuy M amp Bons T A (2004) Construct validity ofRavens Advanced Progressive Matrices for African and non-African engineering students in South Africa InternationalJournal of Selection and Assessment 12(3) 220minus229

Schmidt F L amp Hunter J E (1998) The validity and utility ofselection methods in personnel psychology Practical and theore-tical implications of 85 years of research findings PsychologicalBulletin 124(2) 262minus274

Schmidt F L amp Hunter J E (1999) Theory testing andmeasurement error Intelligence 27(3) 183minus198

Schmidt F L amp Le H (2004) Software for the Hunter-Schmidtmeta-analysis methods University of Iowa Department ofManagement and Organization IOWA City IQ 42242

Schroots J J F amp van Alphen de Veer R J (1979) LDT LeidseDiagnostische Test Deel 1 Handleiding [LDT Leiden DiagnosticTest Part 1 Manual] Lisse the Netherlands Swets

Shochet I M (1986) Manifest and potential performance inadvantaged and disadvantaged students Unpublished DPhildissertation University of the Witwatersrand South Africa

300 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Skuy M Gewer A Osrin Y Khunou D Fridjon P amp RushtonJ P (2002) Effects of mediated learning experience on RavensMatrices scores of African and non-African university studentsin South Africa Intelligence 30 221minus232

Swanson H E amp Lussier C M (2001) A selective synthesis of theexperimental literature on dynamic assessment Review of Educa-tional Research 71 321minus363

Teasdale T W amp Owen D R (1989) Continuing secular increase inintelligence and a stable prevalence of high intelligence levelsIntelligence 13 255minus262

Tuma J M amp Appelbaum A S (1980) Reliability and practiceeffects of WISC-R IQ estimates in a normal population Educa-tional and Psychological Measurement 40 671minus678

te Nijenhuis J Tolboom E Resing W amp Bleichrodt N (2004)Does cultural background influence the intellectual performance ofchildren from immigrant groups Validity of the RAKITintelligence test for immigrant children European Journal ofPsychological Assessment 20 10minus26

te Nijenhuis J amp van der Flier H (1997) Comparability of GATBscores for immigrants and majority group members Some Dutchfindings Journal of Applied Psychology 82 675minus687

te Nijenhuis J Voskuijl O F amp Schijve N B (2001) Practice andcoaching on IQ tests Quite a lot of g International Journal ofSelection and Assessment 9 302minus308

Thorndike R L (1985) The central role of general ability inprediction Multivariate Behavioral Research 20 241minus254

Tucker L R Damarin F amp Messick S (1966) A base-free measureof change Psychometrika 31(4) 457minus473

van der Doef M P Kwint J M amp van der Koppel (1989) Wat lerenmoeilijk lerende kinderen van de WISC-R [What do children whohave difficulties in learning learn from the WISC-R] Kind enAdolescent 10 136minus141

United States Department of Labor (1970) Manual for the USTESGeneral Aptitude Test Battery Section III DevelopmentWashing-ton DC United States Department of Labor

van Geffen (1972) De betrouwbaarheid van de GATB 1002-B opbrugklasniveau [The reliability of the GATB 1002 B for the firstclass at secondary school] Catholic University Nijmegen theNetherlands Psychology of Work and Organisation

van Haasen P P de Bruyn E E J Pijl Y J Poortinga Y H LutjeSpelberg H C Vander Steene G et al (1986) WISC-RWechsler Intelligence Scale for Children-Revised Nederlandsta-lige uitgave [WISC-R Wechsler Intelligence Scale for Children-Revised Dutch edition] Lisse the Netherlands Swets

Wechsler D (1955) Manual for the Wechsler Adult IntelligenceScale New York The Psychological Corporation

Wechsler D (1967)Manual for the Wechsler Preschool and PrimaryScale of Intelligence New York The Psychological Corporation

Wechsler D (1974) Manual for the Wechsler Intelligence Scale forChildren-Revised New York The Psychological Corporation

Wechsler D (1981) WAIS-R manual Wechsler Adult IntelligenceScale-Revised New York The Psychological Corporation

Wechsler D (1997)WAIS-III Wechsler Adult Intelligence Scale-thirdedition and WMS-III Wechsler Memory Scale-third editionTechnical manual New York The Psychological Corporation

Wicherts J W Dolan C V Oosterveld P van Baal G C VBoomsma D I amp Span M M (2004) Are intelligence testsmeasurement invariant over time Investigating the nature of theFlynn effect Intelligence 32(5) 509minus537

Yeld N amp Haeck W (1997) Educational histories and academicpotential Can tests deliver Assessment and Evaluation in HigherEducation 22 5minus16

Zaaiman H (1998) Selecting students for Mathematics and ScienceThe challenge facing higher education in South Africa SouthAfrica Pretoria HSRC Publishers

Zaaiman H van der Flier H amp Thijs G D (2001) Dynamic testingin selection for an educational programme Assessing SouthAfrican performance on the Raven Progressive Matrices Inter-national Journal of Selection and Assessment 9 258minus269

Zolezzi S A (1992) Alternative selection measures for universityundergraduate admissions Unpublished MEd dissertation Uni-versity of the Witwatersrand South Africa

Zolezzi S A (1995) The effectiveness of dynamic assessment as analternative aptitude testing strategy Unpublished DPhil disserta-tion University of South Africa South Africa

288 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

that were highly comparable with regard to age andfor the comparisons of adults we choose samples thatwere roughly comparable with regard to age

In the GATB manual (1970 ch 15) 13 combinationsof two studies are described where large samples of menand women that are comparable with respect to age andbackground took the same GATB subtests The averageunweighted correlation between the d vectors of menand women is 83 (total N=3760) In the GATB manual(1970 ch 20) three combinations of three studies aredescribed where very large samples of boys and girlsthat are in the same grade in secondary school took thesame GATB subtests This yielded correlations betweenthe d vectors of respectively 99 98 and 94 (totalN=20541) Together van Geffen (1972) and Bosch(1973) report three Dutch GATB testndashretest studies onchildren in secondary school resulting in threecomparisons between d vectors The average N-weighted correlation between the d vectors is 47 (totalN=127) Vectors of score gains from two differentdatasets on the WISC-R were compared Tuma andAppelbaum (1980) tested children with an average ageof 10 and Wechsler (1974) tested 10- and 11-year-oldsThe correlation between the two d vectors is 71 (totalN=147) Comparison of vectors of score gains fromdatasets on the DAT (Bennett Seashore amp Wesman1974) resulted in correlations of respectively 78 and73 so an average r of 76 (total N=254) So it appearsthat d vectors are quite reliable especially when thesamples are very large We estimated the reliabilities ofthe d vectors in the database using data from thesamples described in this paragraph

84 Correction for restriction of range of g loadings

The value of rgd is attenuated by the restriction ofrange of g loadings in many of the standard testbatteries The most highly g-loaded batteries tend tohave the smallest range of variation in the subtests gloadings Jensen (1998a pp 381ndash382) shows thatrestriction in g loadedness strongly attenuates thecorrelation between g loadings and standardized groupdifferences Hunter and Schmidt (1990 pp 47ndash49)state that the solution to range variation is to define areference population and express all correlations interms of that reference population The Hunter andSchmidt meta-analytical program computes what thecorrelation in a given population would be if thestandard deviation were the same as in the referencepopulation The standard deviations can be comparedby dividing the study population standard deviation bythe reference group population standard deviation that

is u=SDstudySDref As the reference we took thetests that are broadly regarded as exemplary for themeasurement of the intelligence domain namely thevarious versions of the Wechsler tests for childrenThe average standard deviation of g loadings of thevarious Dutch and US versions of the WISC-R andthe WISC-III was 0128 So the SD of g loadings ofall test batteries was compared to the average SD ing loadings in the Wechsler tests for children Thisresulted in some batteriesndashsuch as the GATBndashhavinga value of u larger than 100

85 Correction for deviation from perfect constructvalidity

The deviation from perfect construct validity in gattenuates the value of rgd In making up any collectionof cognitive tests we do not have a perfectly repre-sentative sample of the entire universe of all possiblecognitive tests So any one limited sample of tests willnot yield exactly the same g as any other limitedsample The sample values of g are affected by psy-chometric sampling error but the fact that g is verysubstantially correlated across different test batteriesimplies that the differing obtained values of g can allbe interpreted as estimates of a ldquotruerdquo g The value ofrgd is attenuated by psychometric sampling error ineach of the batteries from which a g factor has beenextracted

The more tests and the higher their g loadings thehigher the g saturation of the composite score TheWechsler tests have a large number of subtests withquite high g loadings resulting in a highly g-saturatedcomposite score Jensen (1998a pp 90ndash91) states thatthe g score of the Wechsler tests correlate more than 95with the tests IQ score However shorter batteries witha substantial number of tests with lower g loadings willlead to a composite with a somewhat lower g saturationJensen (1998a ch 10) states that the average g loadingof an IQ score as measured by various standard IQ testsis in the +80 s When we take this value as an indicationof the degree to which an IQ score is a reflection ofldquotruerdquo g we can estimate that a tests g score correlatesabout 85 with ldquotruerdquo g As g loadings are thecorrelations of tests with the g score it is most likelythat most empirical g loadings will underestimate ldquotruerdquog loadings so empirical g loadings correlate about 85with ldquotruerdquo g loadings As the Schmidt and Le computerprogram only includes corrections for the first fourartifacts the correction for deviation from perfectconstruct validity was carried out on the value of rgdafter correction for the first four artifacts To limit the

Table 1Dutch British and US studies of correlations between g loadings and gain scores

Reference Test r N Information

Drenth et al (1968) AKIT minus 57 100 Primary-school childrenvan Geffen (1972) GATB minus 45 42 Secondary-school children

minus 21 42Bosch (1973) GATB minus 07 43 Secondary-school childrenSchroots and van Alphen

de Veer (1979)LDT minus 42 96 Pre-school and secondary-school children

Bleichrodt et al (1987) RAKIT 09 49 Pre-school childrenminus 25 51 Primary-school childrenminus 21 49 Primary-school children

van der Doef et al (1989) WISC-R minus 69 22 Primary-school children with learning problemsMulder et al (2004) KAIT minus 23 46 Secondary-school children+young adults

minus 42 25 AdultsKort et al (2005) WISC-III minus 15 42 Primary-school children

minus 26 67 Primary-school childrenminus 46 39 Secondary-school children

Luteijn and Barelds (2005) GIT2 minus 51 44 AdultsKooij et al (2005) WAIS-III minus 63 60 AdultsElliott (1983) BAS minus 65 60 Primary-school childrenWechsler (1967) WPPSI minus 46 50 Pre-school childrenUnited States Department

of Labor (1970)GATB minus 35 156 Office applicants

minus 66 605 Male high school seniorsminus 70 554 Female high school seniorsminus 58 223 Males 1-day intervalminus 41 186 Females 1-day intervalminus 50 202 Males 2-week intervalminus 52 152 Females 2-week intervalminus 67 156 Males 6-week intervalminus 61 168 Females 6-week intervalminus 43 176 Males 13-week interval02 149 Females 13-week interval

minus 62 157 Males 26-week intervalminus 32 136 Females 26-week intervalminus 69 119 Males 1-year intervalminus 31 183 Females 1-year intervalminus 96 118 Males 2-year intervalminus 75 170 Females 2-year intervalminus 75 123 Males 3-year intervalminus 48 183 Females 3-year intervalminus 92 3398 Boys secondary schoolminus 92 3680 Girls secondary schoolminus 91 3348 Boys secondary schoolminus 91 3491 Girls secondary schoolminus 84 3229 Boys secondary schoolminus 87 3395 Girls secondary school

Wechsler (1974) WISC-R minus 48 97 Primary-school childrenminus 66 102 Primary-school childrenminus 21 104 Secondary-school children

Bennett et al (1974) DAT minus 79 92 Boys secondary schoolminus 53 81 Girls secondary schoolminus 29 81 Boys secondary schoolminus 62 100 Girls secondary school

Covin (1977) WISC-R minus 57 30 Primary-school children with learning problemsTuma and Appelbaum (1980) WISC-R minus 08 45 Primary- and secondary-school childrenMatarazzo et al (1980) WAIS minus 10 29 Young malesWechsler (1981) WAIS-R minus 64 71 Adults

minus 48 48 Adults

(continued on next page)

289J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Table 1 (continued)

Reference Test r N Information

McCormick et al (1983) ASVAB minus 73 57 adultsKaufman and Kaufman (1983) K-ABC minus 27 46 Pre-school children

minus 18 36 Pre- and primary-school childrenminus 22 70 Primary-school children

Wechsler (1997) WAIS-III minus 45 100 Young adultsminus 57 102 Adultsminus 51 104 Adults03 88 Adults

Reeve and Lam (2005) EAS minus 34 123 Undergraduate students

In general the g loadings were based on the correlation matrix taken from the manuals containing the testndashretest studies or from the correlation matrixbased on the largest sample size we could find What follows is a list of the sources of the g loading when not taken from the manuals containing thetestndashretest studyvan Geffen (1972) and Bosch (1973) de Wolff and Buiten (1963) see also Johnson te Nijenhuis and Bouchard (in press) Bleichrodt et al (1987) teNijenhuis et al (2004) who used the same data on which the RAKIT manual is based van der Doef Kwint and van der Koppel (1989) DutchWISC-R manual Elliott (1983) Table 98 Age 90ndash911 years US Dept of Labors GATB (1970) Jensen (1985 p 214) using the largestcorrelation matrix in the GATBmanual Wechsler (1974) Covin (1977) and Tuma and Appelbaum (1980) Jensen (1985 p 214 first study) Bennettet al (1974) average of four highly similar correlation matrices Matarazzo et al (1980) Wechslers (1955 p 17) Table 8 for ages 25ndash34McCormick et al (1983) Ree and Carretta (1994) Reeve and Lam (2005) utilize SEM analyses and use item parcels instead of full scale scores tocompute g loadings The average g loading of all the item parcels for a specific subtest was taken as the g loading of that specific subtest

290 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

risk of overcorrection we conservatively chose thevalue of 90 for the correction

9 Results

The results of the studies on the correlation between gloadings and gain scores are shown in Table 1 The tablegives data derived from 64 studies with participantsnumbering a total of 26990 The table gives thereference for the study the cognitive ability test usedthe correlation between g loadings and gain scores thesample size and background information on the study Itis clear that virtually all correlations are negative and thatthe size of the few positive correlations is very small

Table 2 shows the results of the psychometric meta-analysis of the 64 data points It shows (from left toright) the number of correlation coefficients (K) totalsample size (N) the mean observed correlations (r) andtheir standard deviation (SDr) the true correlations onecan expect once artifactual error from unreliability in theg vector and the d vector and range restriction in the gvector has been removed (ρ) and their standarddeviation (SDρ) The next two columns present thepercentage of variance explained by artifactual errors (

Table 2Meta-analysis results for correlations between g loadings and gain scores af

Studies included K N r SD

All 64 26990 minus 80 20All minus 3 outliers 61 26704 minus 81 18

K=number of correlations N=total sample size r=mean observed correlcorrelation ρ=true correlation (observed correlation corrected for unreliabiliVE=percentage of variance accounted for by artifactual errors 95 CI=9

VE) and the 95 credibility interval (95 CI) Thisinterval denotes the values one can expect for ρ in 19out of 20 cases

The large number of data points and the very largesample size indicate that we can have confidence in theoutcomes of this meta-analysis The estimated truecorrelation has a value of minus 95 and 81 of the variancein the observed correlations is explained by artifactualerrors However Hunter and Schmidt (1990) state thatextreme outliers should be left out of the analysesbecause they are most likely the result of errors in thedata They also argue that strong outliers artificiallyinflate the SD of effect sizes and thereby reduce theamount of variance that artifacts can explain We choseto leave out three outliersndashmore than 4 SD below theaverage r and more than 8 SD below ρndashcomprising1 of the research participants This resulted in nochanges in the value of the true correlation a largedecrease in the SD of ρ with 74 and a large increasein the amount of variance explained in the observedcorrelations by artifacts by 22 So when the threeoutliers are excluded artifacts explain virtually all of thevariance in the observed correlations Finally a correc-tion for deviation from perfect construct validity in g

ter corrections for reliability and restriction of range

r ρ SDρ VE 95 CI

minus 95 11 81 minus074 to 116minus 95 03 99 minus091 to 100

ation (sample size weighted) SDr=standard deviation of observedty and range restriction) SDρ=standard deviation of true correlation5 credibility interval

291J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

took place using a conservative value of 90 Thisresulted in a value of minus106 for the final estimated truecorrelation between g loadings and score gainsApplying several corrections in a meta-analysis maylead to correlations that are larger than 100 or minus100 asis the case here Percentages of variance accounted forby artifacts larger than 100 are also not uncommon inpsychometric meta-analysis They also do occur in othermethods of statistical estimation (see Hunter amp Schmidt1990 pp 411ndash414 for a discussion)

10 Discussion

A large-scale meta-analysis of 64 testndashretest studiesshows that after corrections for several artifacts there isan estimated true correlation of minus106 between gloading of tests and score gains and virtually all of thevariance in observed correlations is attributable to theseartifacts As several artifacts explain virtually all thevariance in the effect sizes other dimensions on whichthe studies differ such as age of the test takers testndashretest interval test used average-IQ samples or sampleswith learning problems play no role at all

The estimated true correlation of minus106 is the resultof various corrections for artifacts that attenuate thecorrelations The estimated values of the artifacts mayunderestimate or overestimate the population values ofthe artifacts Therefore estimates of true effect sizesmay overestimate or underestimate the populationvalues of the effect size As a solution to this problemHunter and Schmidt (2004) suggest carrying out severalmeta-analyses on the same construct and taking theaverage estimated effect size of all meta-analyses Thegeneral idea is that meta-analysis is a powerful researchtool but does not give perfect outcomes

A correlation of minus106 falls outside the range ofacceptable values of a correlation but one has to make adistinction between the meta-analytical estimate of thetrue correlation between g and d and the true correlationbetween g and d We interpret the value of minus106 for themeta-analytical estimate as meaning that the truecorrelation between g and d is minus100 A correlation ofminus100 means that there is an inverse relationshipbetween g and score gains So the tests with the highestg loadings show the smallest gains The most straightfor-ward interpretation of this very large negative correlationis that there is no g saturation in testndashretest gain scores

11 The South African learning potential study

In a carefully carried-out study Skuy et al (2002)used a dynamic testing procedure to see whether it

would improve the scores of Black South Africanstudents on Ravens Standard Progressive Matrices(RSPM) The Bantu Education Act of 1954 establisheda discriminatory educational system characterized bypoorly qualified teachers sparsely equipped and fundedschools and generally poor quality Most Black studentsin the sample had not received the same quality ofeducation as White students Black White Indian andColored research participants took the RSPM on twooccasions and in between randomly constitutedexperimental groups were exposed to the MediatedLearning Experience Both the Black South Africangroup and the group consisting of White Indian andColored South Africans improved over their baseline onthe RSPM and the Black group showed greaterimprovement

The value of these cognitive interventions increaseswhen the score gains are transferred to other tests andto external criteria such as school or work achieve-ment Therefore the research participants also tookFeuersteins Representational Stencil Design Test as atransfer measure The subject is presented with astencil of a geometric design and then asked to pointto which stencils need to be used and in whatsequence in order to construct an identical designLike the RSPM the Stencils test also requiresrepresentationalabstract thinking but the training onthe RSPM showed little transfer to it Moreover thecorrelation of the RSPM scores with performance inthe end-of-year psychology examination did notsignificantly improve after mediation Once againthe score gains were empty they did not generalizeSkuy et al go on to ask the question what it is thatwas improved by their interventions Professor Skuymade his data accessible to the present authors so wecould perform additional analyses

12 Sample

The data from Skuy et al (2002) were used with theexception of data from three research participantsbecause their pretest IQ scores were extremely low(more than 3 SDs below the group mean) Ninety-fiveuniversity students in psychology aged 16 to 29 (meanage=20 SD=23 25 males 70 females) participatedin this study They were 66 Black students (20 males 46females) and 29 White (20) Indian (6) and Colored (3)students (5 males 24 females) The mean age of theBlack group was 20 (SD=25) and of the WhiteIndian and Colored group 19 years (SD=1) Subjectswere randomly assigned to the experimental group(n=55) and to the control group (n=40)

292 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

13 Procedure

The students participated in pre- and posttest phaseswith a group intervention in between The study focusedon improvement in scores on the RSPM using the SetVariations II of the Learning Propensity AssessmentDevice as the mediation task Mediation training took3 h and was conducted by three experienced psychol-ogists with the assistance of six postgraduate psychol-ogy students A detailed description is given in Skuy etal (2002)

14 Measures and cognitive intervention

The Ravens Standard Progressive Matrices consistsof 60 items (divided into 5 sets of 12 items) designedto measure the ability to form comparisons to reasonby analogy and to organize spatial information intorelated wholes It has been established as one of thepurest measures of g (Jensen 1998a) Skuy et al(2002) found no evidence for test bias against Blacksin South African education Rushton Skuy and Bons(2004) showed that the Ravens gave comparablepredictive validities for students from various groupsCross-cultural testing research has clearly shown thatunsufficient proficiency in the language of the testcan lead to biased assessments in tests with a strongverbal component However the Ravens is a non-verbal test

The Learning Propensity Assessment Device con-sists of 14 exercises Each exercise contains an initialmediation task Subsequent tasks increase in complex-ity and novelty and aim to assist the learner toachieve mastery over the task The purpose ofmediation is to assist the learner to develop theappropriate cognitive strategies and functions neededfor the successful completion of the task The SetVariations II of the Learning Propensity AssessmentDevice consists of five sets of items which comprisevariations of Sets C D and E of the RSPM test Eachset of variations contains a learning task for thepurpose of initial mediation followed by a series ofprogressively more difficult variations to which theskills learned must be applied Mediation involvesdiscussing with groups how to define the problem tobe solved focus on the task set rules regulateproblem solving behavior and identify the correctsequence of logical steps needed to solve the taskMediation also involves helping the subject todevelop appropriate concepts verbal tools andinsights in relation to the task A detailed descriptionis given in Skuy et al (2002)

15 Statistical analyses

Although the Skuy et al study is among the SouthAfrican learning potential studies with the largestsample size the N is not large We therefore chosebasic statistical analyses

151 Descriptive statistics

Means standard deviations and reliabilities werecomputed for the various groups With regard tomeasures of effect size Hunter and Schmidt (1990 p271) advise choosing estimates of variance with the leasterror Because repeated test takings tend to change thesize of the SD (Ackerman 1987) we chose the SD ofthe pretests for the denominator The correlationbetween scores before and after the training wascomputed to see whether the training had an effect onthe rank order of individuals scores

152 Correlation between score gains and g loadedness

Because our sample was not large and quite specificestimates of g loadedness were taken from Lynn Allikand Irwings (2004) item analysis of RSPM in Estoniausing a large (N=2735) nationally representativesample The same reasoning as in psychometric meta-analysis applies namely that larger samples give betterestimates of g loadings than smaller samples In ahierarchical factor analysis of the items using structuralequations modeling Lynn et al computed g loadings of52 of the 60 items In the present study Pearsoncorrelations were calculated between the g loadings ofthese 52 items and the effect sizes on these items

153 g loadings

The RSPM consists of dichotomous items so wecomputed a correlation matrix of polychoric correlations(Nunnally amp Bernstein 1994) A principal axis factoranalysis was carried out The percentage varianceexplained by the first unrotated factor was taken as anestimate of g loadedness Because sample size waslimited we collapsed the experimental and the controlgroup

154 Correlation between sum scores and score gains

We tested whether individuals with low-g improvedtheir scores more than those with high-g by correlatinggain scores with pretest RSPM scores for each of thefour research groups As gain scores tend to be

Table 3Proportion of sample selecting the correct answer on items of Ravens Standard Progressive Matrices by group

Set A Set B Set C Set D Set E

Item Black Other a Item Black Other Item Black Other Item Black Other Item Black Other

1 100 100 13 100 100 25 100 97 37 100 100 49 74 902 97 100 14 100 100 26 96 100 38 99 100 50 64 903 97 100 15 100 100 27 96 100 39 89 100 51 79 974 100 97 16 91 97 28 86 93 40 92 100 52 56 835 100 100 17 96 97 29 94 97 41 96 100 53 52 836 99 100 18 85 100 30 76 83 42 92 100 54 35 767 94 97 19 77 66 31 88 97 43 77 100 55 42 798 91 93 20 79 97 32 50 79 44 76 93 56 21 699 100 97 21 83 97 33 74 90 45 71 97 57 30 4110 91 97 22 92 100 34 61 79 46 79 93 58 12 4111 83 90 23 80 90 35 53 69 47 29 41 59 02 1712 68 83 24 59 83 36 06 35 48 26 38 60 11 21a Other=White Indian and Colored

293J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

negatively correlated with pretest scores as a function ofunreliability (see Cronbach 1990 Nunnally amp Bern-stein 1994) we corrected the correlations using TuckerDamarin and Messicks (1966) formula 63 Using theformula one adds to each correlation the term (SDpretestSD gain score) (1minus reliability pretest)

16 Results

161 Descriptive statistics

Internal consistencies (Cronbach αs) on the RSPMranged from 76 to 86 for the pre- and posttestsrespectively Table 3 shows the proportion of each of thegroups which selected the correct answer on each of the60 items of the pretest Across the 60 items the order ofthe p values was almost identical for Blacks and WhiteIndianColoreds (r=92 p=00)

Table 4 shows the means and standard deviations forthe total RSPM scores for the four groups along withthe d effect sizes representing the difference betweenpre- and posttest scores (Cohen 1988) First we

Table 4Pre- and posttest mean ravens scores standard deviations and mean effect

Black experimental(n=40)

Black control (n=26)

Pretest Posttest Pretest Postte

Raw scoresM 4378 5010 4546 4835SD 664 531 669 671Percentile 14 41 16 31Effect size 095 043

Percentiles are based on US adult norms see Raven Raven and Courts (2a Other=White Indian and Colored

examined whether there was an effect of race (Blackvs WhiteIndianColored) and group (experimental vscontrol) on the pretest scores There was a significanteffect due to race (F(1 91)=2413 p=00 η2 = 21)but not group (F(1 91)=228 p= 14 η2 = 02) Thismeans that mean pretest scores of Blacks (M=4444 SD=665) were lower than those of WhiteIndianColoreds (M=5141 SD=505) and that mean pretestscores of experimental and control groups werecomparable (M=4553 SD=704 and M=48 SD=67 respectively)

Secondly we investigated the effects of training onthe posttest scores by performing a two-way ANCOVAon the total posttest scores with race and group as factorsand the total pretest scores as the covariate There was asignificant effect for group (F(1 95)=1381 p=00η2 = 13) and for race (F(1 90)=399 p=05 η2 = 04)but not for the two-way interaction of group and race (F(1 90)=028 p= 60 η2 = 00) These results indicatethat the training was equally effective for both the Blackand WhiteIndianColored students Posttest scores ofBlacks (M=4941 SD=591) however remained

sizes for Black and WhiteIndianColored students

Other a experimental(n=15)

Other control (n=14)

st Pretest Posttest Pretest Posttest

5020 5580 5271 5536605 376 345 34341 75 55 68093 077

000) Table SPM13

294 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

significantly lower (F(1 91)=2833 p=00) than thoseof WhitesIndiansColoreds (M=5559 SD=355)Although posttest scores of the experimental group(M=5165 SD=553) were higher than those of thecontrol group (M=508 SD = 665) differencesbetween both groups were nonsignificant (F(1 91)=085 p=36)

The correlation between scores before and after thetraining was 84 (p=00) for the experimental group and90 (p=00) for the control group showing that thetraining had only limited effect on the rank order ofindividuals scores This means that the test strongly butnot perfectly measures the same constructs on bothoccasions

162 Correlation between score gains and g loadedness

We estimated effect sizes for each of the four groups(race by condition) by computing the differencebetween mean pretest scores and posttest scores dividedby the standard deviation of the pretest scores of Blackand WhiteIndianColored students respectivelyFinally we calculated the correlations between effectsizes and the g loadings taken from Lynn et alCorrelations were minus 24 (p=10) for the Black experi-mental group minus 21 (p=20) for the WhiteIndianColored experimental group minus 08 (p=59) for theBlack control group and minus 41 (p=01) for the WhiteIndianColored control group Small sample sizesusually attenuate correlations (Hunter amp Schmidt1990) Collapsing the groups indeed resulted in higheraverage correlations minus 39 for the complete experimen-tal group and minus 26 for the complete control group

163 g loadings

Using the combined experimental and controlgroup a principle axis factor analysis on the pretestand posttest scores respectively resulted in a firstunrotated factor explaining 22 of the variance in thepretest scores and 18 of the variance in the posttestscores These findings suggest that the g loadedness ofthe RSPM decreased substantially after MediatedLearning Experience

164 Correlation between score gains and sum score

Correlating score gains with RSPM total scoresresulted in values of minus 60 (p=00) for the Blackexperimental group minus 18 (p=38) for the Black controlgroup minus 82 (p= 00) for the WhiteIndianColoredexperimental group and minus 48 (p=08) for the White

IndianColored control group After the use of thecorrection formula of Tucker et al (1966) thesecorrelations became minus 39 minus 08 minus 61 and minus 35respectively Overall these correlations show that low-g persons improved their scores more strongly thanhigh-g persons

17 Discussion

Skuy et al (2002) hypothesized that the low-qualityeducation of Blacks in South Africa would lead to anunderestimate of their cognitive abilities by IQ testsGroups of Black and WhiteIndianColored studentstook the Ravens Progressive Matrices twice and inbetween received Feuersteins Mediated LearningExperience The test scores went up substantially in allgroups Evidence for an authentic change in the g factorrequires broad transfer or generalizability across a widevariety of cognitive performance However Skuy et alshow that the gains did not generalize to scores on another highly similar test and to external criteria andwere therefore hollow As the score gains were in somecases quite largendash14 IQ points for the Black experi-mental groupndashthe question becomes what is it thatimproved

The findings show that the correlations betweenscore gains and g loadedness of the items were minus 39 forthe complete experimental group and minus 26 for thecomplete control group However because the gloadings and gain scores are measured at the itemlevel their reliabilities are not high resulting insubstantial attenuation of the correlation between gand d Moreover RSPM does not measure g perfectlyJensen (1998a p 91) estimates its g loading at 83When we estimate the reliability of the g vector at 70and the reliability of the gain score vector at 50corrections for unreliability and deviation from perfectconstruct validity of g only would result in estimatedtrue correlations of respectively minus 80 and minus 53 Thesevalues should be taken as underestimates controllingfor additional artifacts will bring them closer to the verystrong negative correlation found in the meta-analysis

The findings suggest that after training the gloadedness of the test decreased substantially Wefound negative substantial correlations between gainscores and RSPM total scores Table 4 shows that thetotal score variance decreased after training which is inline with low-g subjects increasing more than high-gsubjects Since as a rule high-g individuals profit themost from trainingndashas is reflected in the ubiquitouspositive correlation between IQ scores and trainingperformance (Jensen 1980 Schmidt amp Hunter 1998)ndash

295J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

these findings could be interpreted as an indication thatFeuersteins Mediated Learning Experience is not g-loaded in contrast with regular trainings that are clearlyg-loaded Substantial negative correlations betweengain scores and RSPM total scores are no definite proofof this hypothesis but are in line with it Additionalsubstantiation of our hypothesis that the Feuersteintraining has no or little g loadedness is that Coyle (2006)showed that gain scores loaded virtually zero on the gfactor Moreover Skuy et al reported that the predictivevalidity of their measure did not increase when thesecond Raven score was used The fact that individualswith low-g gained more than those with high-g could beinterpreted as an indication that the Mediated LearningExperience was not g-loaded It should be notedhowever that Feuerstein most likely did not intend hisintervention to be g-loaded He was interested inincreasing the performance of low scorers on bothtests and external criteria

18 General discussion

IQ scores are by far the best general predictor ofsuccess in education job training and work Howeverthere are many ways in which these IQ scores can beincreased for instance by means of retesting orparticipating in a learning potential training programWhat conclusions can be drawn from such score gainsJensens (1998a) hypothesis that the effects of trainingon abilities can be summarized in terms of Carrollsthree-stratum hierarchical factor model was tested in ameta-analysis on testndashretest data using Dutch Britishand American test batteries and with learning potentialdata from South Africa using Ravens ProgressiveMatrices The meta-analysis convincingly shows thattestndashretest score gains are not g-loaded The findingsfrom the learning potential study are clearly in line withthis when the attenuation caused by unreliability andother artifacts is taken into account the correlationbetween g loadings of items and gains on items has avalue that is somewhat comparable to the one found inthe meta-analysis for test batteries The data suggest thatthe g loadedness of item scores decreases after theintervention training Te Nijenhuis et als (2001)finding that practice and coaching reduced the g-loadedness of their test scores strengthens the presentfindings using item scores The findings show that notthe high-g participants increase their scores the mostndashasis common in training situationsndashbut it is the low-gpersons showing the largest increases of their scoresThis suggests that the intervention training is not g-loaded

Our findings fit quite well with the hierarchical modelof intelligence The generalizability of test scores residespredominantly in the g component whereas the test-specific ability component and the narrow abilitycomponent are virtually non-generalizable This is forinstance evidenced by the earlier finding that addingverbal tests to a g score or numerical tests to a g scoreresulted in only a very small incremental validity (Ree ampEarles 1991 Ree et al 1994) Additionally Ericssonand Lehmann (1996) reported immense gains for amemory task focusing on one narrow ability but did notfind any improvement for comparable memory tasksfocusing on another narrow ability As the score gains arenot related to g the generalizable g componentdecreases and since it is not unlikely that the Feuersteintraining itself is not g-loaded it is easy to understand whythe score gains did not generalize to scores on thecognitively loaded Representational Stencil Design TestFor a similar reason the score gains did not generalize tog-loaded external criteria as the correlation of the RSPMscores with performance in the end-of-year psychologyexamination did not significantly improve after media-tion Reeve and Lam (2005) claimed that retesting doesnot change the nature of what is being tested but ourfindings suggest the opposite

19 Limitations of the studies

Our meta-analysis and our analysis of the SouthAfrican study are strongly based on the method ofcorrelated vectors (MCV) and recently it has been shownto have limitations Dolan and Lubke (2001) have shownthat when comparing groups substantial positive vectorcorrelations can still be obtained even when groups differnot only on g but also on factors uncorrelated with gAshton and Lee (2005) show that associations of avariable with non-g sources of variance can produce avector correlation of zero even when the variable isstrongly associated with g They suggest that the gloadings of a subtest are sensitive to the nature of the othersubtest in a battery so that a specific sample of subtestsmay cause a spurious correlation between the vectorsNotwithstanding these limitations studies using MCVcontinue to appear (see for instance Colom Haier ampJung in pressHartmannKruuseampNyborg in press Leeet al 2006) The outcomes of our meta-analysis of a largenumber of studies using the method of correlated vectorsmay make an interesting contribution to the discussion onthe limitations of the method of correlated vectors

A principle of meta-analysis is that the amount ofinformation contained in one individual study is quitemodest Therefore one should carry out an analysis of

296 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

all studies on one topic and correct for artifacts leadingto a strong increase of the amount of information Thefact that our meta-analytical value of r=minus106 isvirtually identical to the theoretically expected correla-tion between g and d of minus100 holds some promise thata psychometric meta-analysis of studies using MCV is apowerful way of reducing some of the limitations ofMCV An alternative methodological approach is tolimit oneself to the rare datasets enabling the use ofstructural equations modeling However from a meta-analytical point of view these studies yield only a quitemodest amount of information

Additional meta-analyses of studies employing MCVare necessary to establish the validity of the combinationof MCV and psychometric meta-analysis Most likelymany would agree that a high positive meta-analyticalcorrelation between measures of g and measures ofanother construct implies that g plays a major role andthat a meta-analytical correlation of minus100 implies that gplays no role However it is not clear what value of themeta-analytical correlation to expect from MCV when gplays only a modest role After the present meta-analysison a construct that clearly has an inverse relationshipwith g it would be informative to carry out meta-analyses of studies on variables that are strongly linkedto g and variables that are modestly linked to g Anexample of the latter would be secular score gainswhich according to Lynns (1990) nutrition theoryshould be modestly g-loaded

The sample sizes in the South African study are notlarge but still larger than those in many other studies oflearning potential where an Nasymp10 is not unusual Theresults of a reanalysis of the many existing studies ondynamic testing could lead to a meta-analysis with alarge combined N The mean posttest score was quitehigh so a ceiling effect may have taken place for theWhiteIndianColored group leading to an underestima-tion of the experimental score gain for this group

Instead of testing the hypothesis with a stronglyunidimensional test such as the RSPM it would be betterto use a multidimensional test Moreover a large samplesize would allow the use of more rigorous data-analyticaltechniques leading to more definitive results Howeverto the best of our knowledge datasets meeting theserequirements do not exist and the Skuy et al study isarguably the best South African learning potential study

20 Score gains as low-quality measuresof motivation

As criterion-related validity is strongly dependent ong te Nijenhuis et als finding of lowered g loadings

after training should result in lowered criterion-relatedvalidity However the empirical findings show theopposite virtually all testndashretest and test preparationstudies on cognitive tests and scholastic aptitude tests thatreported both criterion-related validities demonstratesmall to modest increases in criterion-related validity forthe second or third test score (see Allalouf amp Ben-Shakhar 1998 Bashi 1976 Coyle 2006 HausknechtTrevor amp Farr 2002 Jones 1986 Linn 1977 Olsen ampSchrader 1959 Ortar 1960 Powers 1985 Reeve ampLam 2005) In the carefully designed study by Allaloufand Ben-Shakhar (1998) of a university entrance test theexperimental group received an intensive 40-h testcoaching program while the control group did not Thecriterion-related validity for the retest increased for bothgroups Most importantly the increase was the samemdashitwas not larger for the experimental group

In a little-known but carefully designed large-scalelearning potential study by Resing (1990 see Table423) she compared an experimental group thatreceived a pretest a learning potential training and aposttest against a control group that received only thepretest and the posttest The mean criterion-relatedvalidity of the various second scores was 62 for both theexperimental and the control group Learning potentialtraining did not result in incremental criterion-relatedvalidity over and above the validity resulting fromsimply retesting The findings from both Resing andAllalouf and Ben-Shakhar suggest that cognitiveinterventions do not increase criterion-related validitymore than simple retesting

g and the personality measure conscientiousness havebeen shown to make an excellent combination ofpredictors (Schmidt amp Hunter 1998) Conscientiousnessrepresents among other characteristics persistence a willto achieve and the ability to focus effort on the goal Afield study on test preparation using actual job applicants(Clause Delbridge Schmitt Chan amp Jennings 2001)showed that motivation to perform well on the testcorrelated 25 with test performance One could speculatethat score increases do not reflect a true cognitivecomponent but rather become low-quality measures ofmotivation Further since the increase in validity due toretesting and learning potential training is modest incomparison to the large increase obtainable from the useof personality questionnaires personality testing mightprovide a less expensive and more accurate alternative

21 Effectiveness of various training formats

Components of the mediation training used by Skuyet al (2002) are similar to the test training used in te

297J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Nijenhuis et al (2001) Both the Dutch training and theSouth African training took 3 h but whereas in theDutch training the focus was on two different testformats the South African training dealt only with onetest format The test training by Lloyd and Pidgeon(1961) took even less time namely two half-hoursegments each focusing on one test format The effectsizes in all studies were roughly comparable Thissuggests that the methodologies employed by teNijenhuis et al and Lloyd and Pidgeon were moreefficient than those used by Skuy et al It is possible thatthe components of the mediation training that are notpresent in the other two training formats are not effectivein raising test scores and could therefore be left out Iftrue it might be possible to increase the scores on theRSPM by one SD with a relatively simple 1-h training

22 Generalizability of findings

Can these findings of hollow score gains after testndashretest test practice and Mediated Learning ExperienceTraining be generalized to other studies where training-induced score gains were found Ericsson and Lehmann(1996) reported tremendous score increases afterintensive training on numeric memory tests but thesegains did not generalize in the least to verbal memorytests Such gains on one narrow ability do not generalizeto another narrow ability clustering under the samebroad ability and are therefore hollow Similarly Jensen(1998b) showed that score gains due to adoption werenot on the g factor and were therefore most likelyhollow

Rushton (1999) argued that intergenerational scoregains are not linked to g suggesting the Flynn effectsmay be empty but he was strongly criticized by Flynn(1999 2000) In studies on the Flynn effect score gainsfound in cross-sectional studies are largest on the RSPM(Flynn 1987) It has been suggested by Lynn (1998) thata substantial part of these intergenerational score gainson the RSPM are generalizablendashthey do reflect highergndashbut the remaining part is hollow and should beinterpreted as schooling effects The RSPM does requirethe application of the mathematical principles ofaddition subtraction progression and the distributionof values In the three decades (1950sndash1980s) overwhich these increases in RSPM scores have occurredincreasing proportions of 15- to 18-year-olds haveremained in schools where they have learned mathskills that they have applied to the solution of matricesproblems Our findings could be interpreted as supportfor Lynns hypothesis of the partial hollowness of scoregains on the RSPM Notwithstanding the high g loading

of the sum score of the RSPM it is quite sensitive totestndashretest effects and training effects Some studies onthe Flynn effect (Lynn amp Hampson 1986 Teasdale ampOwen 1989) show that the increase in scores is largelyconcentrated in the lower segments of the IQ distribu-tion Our finding that low scorers show the largest gainsafter training may additionally support the notion that apart of the Flynn effect on the RSPM is hollow FinallyWicherts et als (2004) findings show that in some oftheir datasets the secular score gains are most stronglylinked to broad- narrow- and test-specific abilitiesshowing that an important part of the gains are non-generalizable

Ceci (1991) showed that increased schooling leads tohigher IQ scores but are these gains highly specific orpredominantly generalizable It would be interesting toapply the techniques we used in this study to thefindings from previous intervention studies It may bethat biological interventions (such as diet vitaminsupplements vaccination against infectious disease)rather than psychological or educational interventionsare the most cost-effective method of producing truechanges in g and broad abilities It may be that there is abiological barrier between the first stratum and thesecond stratum that restricts the effects of behavioralinterventions to narrow abilities and test specificities

Acknowledgement

We like to thank Mervyn Skuy for his permission touse his dataset

Thanks to Marieacute de Beer Raegan Murphy WelkoTomic Art Jensen and Frank Schmidt for feedback onprevious versions of this paper

Thanks to Arne Evers Wilma Resing (Dutch TestCommittee) and Andress Kooij (Harcourt) for alsohelping in locating testndashretest studies

References

Ackerman P L (1986) Individual differences in informationprocessing An investigation of intellectual abilities Intelligence10 101minus139

Ackerman P L (1987) Individual differences in skill learning Anintegration of psychometric and information processing skillsPsychological Bulletin 102 3minus27

Allalouf A amp Ben-Shakhar G (1998) The effect of coaching on thepredictive validity of scholastic aptitude tests Journal ofEducational Measurement 35(1) 31minus47

Ashton M C amp Lee K (2005) Problems with the method ofcorrelated vectors Intelligence 33 431minus444

Bashi Y (1976) Verbal and non-verbal abilities of 4th 6th and 8thgrade students in the Arab educational system in Israel JerusalemHebrew University School of Education

298 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Bleichrodt N Resing W C M Drenth P J D amp Zaal J N (1987)Intelligentie-meting bij kinderen Empirische en methodologischeverantwoording van de geReviseerde Amsterdamse Kinder Intelli-gentie Test [Measuring the intelligence of children Empirical andmethodological justification of the Revised Amsterdam ChildrenIntelligence Test] Lisse the Netherlands Swets

Bennett G K Seashore H G ampWesman A G (1974)DifferentialAptitude Tests (5th ed) Manual New York The PsychologicalCorporation

Boeyens J C A (1989) Learning potential An empiricalinvestigation Pretoria South Africa Human Science ResearchCouncil

Bosch F (1973) Inventarisatie beschrijving en onderzoek mbt dewijzigingen van de GATB incl test-hertest onderzoek (NoPz3bRp0120) [Stock-taking description and research concern-ing the modifications of the GATB includes testndashretest study]Utrecht the Netherlands Nederlandse Spoorwegen

Carroll J B (1993) Human cognitive abilities A survey of factoranalysis studies Cambridge University Press

Ceci S J (1991) How much does schooling influence generalintelligence and its cognitive components A reassessment of theevidence Developmental Psychology 27 703minus722

Christian K Bachnan H J amp Morrison F J (2001) Schooling andcognitive development In R J Sternberg amp E L Grigorenko(Eds) Environmental effects on cognitive abilities (pp 287minus335)Mahwah NJ Erlbaum

Clause C S Delbridge K Schmitt N Chan D amp Jennings D(2001) Test preparation activities and employment test perfor-mance Human Performance 14 149minus167

Cohen J (1988) Statistical power analysis for the behavioralsciences Hillsdale Lawrence Erlbaum

Colom R Jung R E amp Haier R J (in press) Finding the g-factor inbrain structure using the method of correlated vectors Intelligence

Covin T A (1977) Stability of the WISC-R for 9-year-olds withlearning difficulties Psychological Reports 40 1297minus1298

Coyle T R (2006) Testndashretest changes on scholastic aptitude tests arenot related to g Intelligence 34 15minus27

Cronbach L J (1990) Essentials of psychological testing New YorkHarperCollins

de Villiers AB (1999) Disadvantaged students academic perfor-mance Analysing the zone proximal developmentUnpublished DPhil thesis University of Cape Town South Africa

de Wolff C J amp Buiten B (1963) Een factoranalyse van viertestbatterijen [A factor analysis of four test batteries] NederlandsTijdschrift Voor Psychologie 18 220minus239

Dolan C V amp Lubke G (2001) Viewing Spearmans hypothesisfrom the perspective of multigroup PCA A comment onSchonemanns criticism Intelligence 29 231minus245

Drenth P J D Petrie J F amp Bleichrodt N (1968) Handleiding bijde Amsterdamse Kinder Intelligentie Test [Manual of theAmsterdam Children Intelligence Test] Amsterdam VrijeUniversiteit

Elliott C D (1983) British Ability Scales Manual 2 TechnicalHandbook Windsor Great-Britain NFER-Nelson

Engelbrecht M (1999) Leerpotensiaal as voorspeller van akademi-ese sukses van universiteitsstudente [Learning potential aspredictor of the academic success of university students]Unpublished D Phil thesis Potchefstroom University forChristian Higher Education South Africa

Ericsson K A amp Lehmann A C (1996) Expert and exceptionalperformance Evidence of maximal adaptation to task constraintsAnnual Review of Psychology 47 273minus305

Evers A amp Lucassen W (1991) Handleiding DAT 83 DifferentieumlleAanleg Testserie [Manual DAT83 Differential Aptitude Testseries] Amsterdam Swets

Fleishman E A amp Hempel W E (1955) The relation betweenabilities and improvement with practice in a visual discriminationreaction task Journal of Experimental Psychology 49 301minus312

Flynn J R (1987) Massive IQ gains in 14 nations What IQ testsreally measure Psychological Bulletin 101 171minus191

Flynn J R (1999) Evidence against Rushton The genetic loading ofWISC-R subtests and the causes of between-group IQ differencesPersonality and Individual Differences 26 373minus379

Flynn J R (2000) IQ gains WISC subtests and fluid g g theory andthe relevance of Spearmans hypothesis to race In G R B JGoode (Ed) The nature of intelligence (pp 202minus227) New YorkWiley

Gaydon VP (1988) Predictors of performance of disadvantagedadolescents on the SowetoAlexandra gifted child programmeUnpublished M Ed dissertation University of the WitwatersrandSouth Africa

Gottfredson L S (1997) Why g matters The complexity of everydaylife Intelligence 24(1) 79minus132

Gottfredson L S (2002) g Highly general and highly practical In RJ Sternberg amp E L Grigorenko (Eds) The general intelligencefactor How general is it (pp 331minus380) Mahwah NJ Erlbaum

Grigorenko E L amp Sternberg R J (1998) Dynamic testing Psy-chological Bulletin 124 75minus111

HaeckW Yeld N Conradie J Robertson N amp Shall A (1997) Adevelopmental approach to mathematics testing for universityadmissions and course placement Educational Studies in Mathe-matics 33 71minus91

Hartmann P Kruuse NHS amp Nyborg H (in press) Testing thecross-racial generality of Spearmans hypothesis in two samplesIntelligence

Hausknecht J P Trevor C O amp Farr J L (2002) Retaking abilitytests in a selection setting Implications for practice effects trainingperformance and turnover Journal of Applied Psychology 87(2)243minus254

Hunter J E amp Schmidt F L (1990) Methods of meta-analysisLondon Sage

Hunter J E amp Schmidt F L (2004) Methods of meta-analysis (2nded) London Sage

Jensen A R (1980) Bias in mental testing London MethuenJensen A R (1985) The nature of the blackndashwhite difference on

various psychometric tests Spearmans hypothesis Behavioraland Brain Sciences 8 193minus263

Jensen A R (1998a) The g factor The science of mental abilityLondon Praeger

Jensen A R (1998b) Adoption data and two g-related hypothesesIntelligence 25 1minus6

Johnson W Bouchard T J Krueger R F Jr McGue M ampGottesman I I (2004) Just one g Consistent results from threetest batteries Intelligence 32 95minus107

Johnson W te Nijenhuis J amp Bouchard TJ Jr (in press)Replication of the hierarchical visual-perceptual-image rotationmodel in de Wolff and Buitens (1963) battery of 46 tests of mentalability Intelligence

Jones R J (1986) A comparison of the predictive validity of theMCAT for coached and uncoached students Journal of MedicalEducation 61 335minus338

Kaufman A S amp Kaufman N L (1983) K-ABC KaufmanAssessment Battery for Children Interpretive manual CirclePines MN AGS

299J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Kooij A P Rolfhus E Wilkins C Yang Z amp Zhu J (2005)WAIS-III Nederlandstalige bewerking Technisch rapport hernor-mering [WAIS-III adoptation in Dutch Technical report renorm-ing] Amsterdam Harcourt

Kort W Schittekatte M Dekker P H Verhaeghe P Compaan EL Bosmans M amp Vermeir G (2005) WISC-IIINL WechslerIntelligence Scale for Children Derde Editie NL Handleiding enverantwoording [The Dutch WISC-III Wechsler Intelligence Scalefor Children Third Edition for the Netherlands Manual andjustification] Amsterdam NIP

Kulik J A Bangert-Drowns R L amp Kulik C C (1984)Effectiveness of coaching for aptitude tests PsychologicalBulletin 95 179minus188

Kulik J A Kulik C C amp Bangert R L (1984) Effects of practiceon aptitude and achievement test scores American EducationalResearch Journal 21 435minus447

Lee K H Choi Y Y Gray J R Cho S H Chae J -H Lee S etal (2006) Neural correlates of superior intelligence Strongerrecruitment of posterior parietal cortex NeuroImage 29(2)578minus586

Linn R L (1977) On the treatment of multiple scores for LawSchool Admission Test repeaters (Report LSAC-77-4) In LawSchool Admission Council Reports of LSAC Sponsored ResearchVolume III 1975-1977 Princeton NJ Law School AdmissionCouncil

Lipson LE (1992) Relationship of static and dynamic measures toscholastic achievement of black pupils Unpublished MEddissertation University of Witwatersrand South Africa

Lloyd F amp Pidgeon D A (1961) An investigation into the effects ofcoaching on non-verbal test material with European Indian andAfrican children British Journal of Educational Psychology 31145minus151

Luteijn F amp Barelds D P H (2005) GIT2 Groninger IntelligentieTest 2 [GIT2 Groningen Intelligence Test 2] Amsterdam Harcourt

Lynn R (1990) The role of nutrition in secular increases inintelligence Personality and Individual Differences 11 273minus285

Lynn R (1998) In support of the nutrition theory In U Neisser(Ed) The rising curve Long-term gains in IQ and relatedmeasures (pp 207minus215) Washington DC American Psycholo-gical Association

Lynn R Allik J amp Irwing P (2004) Sex differences on three factorsidentified in Ravens Standard Progressive Matrices Intelligence32 411minus424

Lynn R amp Hampson S (1986) The rise of national intelligenceEvidence from Britain Japan and the USA Personality andIndividual Differences 7 23minus32

Matarazzo J D Carmody T P amp Jacobs L D (1980) Testndashretestreliability and stability of the WAIS A literature review withimplications for clinical practice Journal of Clinical Neuropsy-chology 2(2) 89minus105

McCormick BK Dunlap WP Kennedy RS amp Jones MB(1983) The effects of practice on the Armed Forces VocationalAptitude Test Battery US Army Research Institute for theBehavioral and Social Sciences Technical Report 602

Mulder J L Dekker R amp Dekker P H (2004) KaufmanIntelligentietest voor adolesecenten en volwassenen (KAIT)Handleiding [Kaufman Intelligence test for adolescents and adults(KAIT) Manual] Leiden the Netherlands PITS

Murphy R (2002) A review of South African research in the fieldof dynamic assessment Unpublished MA dissertation Universityof Pretoria (available online from httpupetdupaczathesisavailableetd-05042002-161239)

Nel A (1997)Die voorspelling van akademiese sukses binne kontekstvan n alternatiewe universiteitstoelatingsbeleid [The predictionof academic success within the context of an alternative policy ofuniversity admission] Unpublished MA dissertation RandAfrikaans University South Africa

Neubauer A C amp Freudenthaler H H (1994) Reaction time in asentence-picture verification test and intelligence Individualstrategies and effects of extended practice Intelligence 19193minus218

Nunnally J C amp Bernstein I H (1994) Psychometric theory(3rd ed) New York McGraw-Hill

Olsen M amp Schrader W B (1959) The use of preliminary and finalScholastic Aptitude Test scores in predicting college grades(College Entrance Examination Board Research and DevelopmentReports and Statistical Reports 59-19 Princeton NJ Educa-tional Testing Service

Ortar G R (1960) Improving test validity by coaching EducationalResearch 2 137minus142

Powers D E (1985) Effects of test preparation on the validity ofGraduate Admission Test Applied Psychological Measurement 9179minus190

Raven J Raven J C amp Court J H (2000) Standard ProgressiveMatrices Raven manual Section 3 Oxford Psychologists Press

Ree M J amp Carretta T R (1994) The correlation of generalcognitive ability and psychomotor tracking tests InternationalJournal of Selection and Assessment 2 209minus216

Ree M J amp Earles A A (1991) Predicting training success Notmuch more than g Personnel Psychology 44 321minus332

Ree M J Earles J A amp Teachout M S (1994) Predicting jobperformance Not much more than g Journal of AppliedPsychology 79 518minus524

Reeve C L amp Lam H (2005) The psychometric paradox of practiceeffects due to retesting Measurement invariance and stable abilityestimates in the face of observed score changes Intelligence 33535minus549

Resing W C M (1990) Intelligentie en leerpotentieel Eenonderzoek naar het leerpotentieel van jonge leerlingen uit hetbasis-en speciaal onderwijs [Intelligence and learning potential Astudy into the learning potential of young students in basic andspecial education] Amsterdam the Netherlands Swets

Rushton J P (1999) Secular gains in IQ are not related to the g factorand inbreeding depressionmdashunlike blackndashwhite differences A replyto Flynn Personality and Individual Differences 26 381minus389

Rushton J P Skuy M amp Bons T A (2004) Construct validity ofRavens Advanced Progressive Matrices for African and non-African engineering students in South Africa InternationalJournal of Selection and Assessment 12(3) 220minus229

Schmidt F L amp Hunter J E (1998) The validity and utility ofselection methods in personnel psychology Practical and theore-tical implications of 85 years of research findings PsychologicalBulletin 124(2) 262minus274

Schmidt F L amp Hunter J E (1999) Theory testing andmeasurement error Intelligence 27(3) 183minus198

Schmidt F L amp Le H (2004) Software for the Hunter-Schmidtmeta-analysis methods University of Iowa Department ofManagement and Organization IOWA City IQ 42242

Schroots J J F amp van Alphen de Veer R J (1979) LDT LeidseDiagnostische Test Deel 1 Handleiding [LDT Leiden DiagnosticTest Part 1 Manual] Lisse the Netherlands Swets

Shochet I M (1986) Manifest and potential performance inadvantaged and disadvantaged students Unpublished DPhildissertation University of the Witwatersrand South Africa

300 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Skuy M Gewer A Osrin Y Khunou D Fridjon P amp RushtonJ P (2002) Effects of mediated learning experience on RavensMatrices scores of African and non-African university studentsin South Africa Intelligence 30 221minus232

Swanson H E amp Lussier C M (2001) A selective synthesis of theexperimental literature on dynamic assessment Review of Educa-tional Research 71 321minus363

Teasdale T W amp Owen D R (1989) Continuing secular increase inintelligence and a stable prevalence of high intelligence levelsIntelligence 13 255minus262

Tuma J M amp Appelbaum A S (1980) Reliability and practiceeffects of WISC-R IQ estimates in a normal population Educa-tional and Psychological Measurement 40 671minus678

te Nijenhuis J Tolboom E Resing W amp Bleichrodt N (2004)Does cultural background influence the intellectual performance ofchildren from immigrant groups Validity of the RAKITintelligence test for immigrant children European Journal ofPsychological Assessment 20 10minus26

te Nijenhuis J amp van der Flier H (1997) Comparability of GATBscores for immigrants and majority group members Some Dutchfindings Journal of Applied Psychology 82 675minus687

te Nijenhuis J Voskuijl O F amp Schijve N B (2001) Practice andcoaching on IQ tests Quite a lot of g International Journal ofSelection and Assessment 9 302minus308

Thorndike R L (1985) The central role of general ability inprediction Multivariate Behavioral Research 20 241minus254

Tucker L R Damarin F amp Messick S (1966) A base-free measureof change Psychometrika 31(4) 457minus473

van der Doef M P Kwint J M amp van der Koppel (1989) Wat lerenmoeilijk lerende kinderen van de WISC-R [What do children whohave difficulties in learning learn from the WISC-R] Kind enAdolescent 10 136minus141

United States Department of Labor (1970) Manual for the USTESGeneral Aptitude Test Battery Section III DevelopmentWashing-ton DC United States Department of Labor

van Geffen (1972) De betrouwbaarheid van de GATB 1002-B opbrugklasniveau [The reliability of the GATB 1002 B for the firstclass at secondary school] Catholic University Nijmegen theNetherlands Psychology of Work and Organisation

van Haasen P P de Bruyn E E J Pijl Y J Poortinga Y H LutjeSpelberg H C Vander Steene G et al (1986) WISC-RWechsler Intelligence Scale for Children-Revised Nederlandsta-lige uitgave [WISC-R Wechsler Intelligence Scale for Children-Revised Dutch edition] Lisse the Netherlands Swets

Wechsler D (1955) Manual for the Wechsler Adult IntelligenceScale New York The Psychological Corporation

Wechsler D (1967)Manual for the Wechsler Preschool and PrimaryScale of Intelligence New York The Psychological Corporation

Wechsler D (1974) Manual for the Wechsler Intelligence Scale forChildren-Revised New York The Psychological Corporation

Wechsler D (1981) WAIS-R manual Wechsler Adult IntelligenceScale-Revised New York The Psychological Corporation

Wechsler D (1997)WAIS-III Wechsler Adult Intelligence Scale-thirdedition and WMS-III Wechsler Memory Scale-third editionTechnical manual New York The Psychological Corporation

Wicherts J W Dolan C V Oosterveld P van Baal G C VBoomsma D I amp Span M M (2004) Are intelligence testsmeasurement invariant over time Investigating the nature of theFlynn effect Intelligence 32(5) 509minus537

Yeld N amp Haeck W (1997) Educational histories and academicpotential Can tests deliver Assessment and Evaluation in HigherEducation 22 5minus16

Zaaiman H (1998) Selecting students for Mathematics and ScienceThe challenge facing higher education in South Africa SouthAfrica Pretoria HSRC Publishers

Zaaiman H van der Flier H amp Thijs G D (2001) Dynamic testingin selection for an educational programme Assessing SouthAfrican performance on the Raven Progressive Matrices Inter-national Journal of Selection and Assessment 9 258minus269

Zolezzi S A (1992) Alternative selection measures for universityundergraduate admissions Unpublished MEd dissertation Uni-versity of the Witwatersrand South Africa

Zolezzi S A (1995) The effectiveness of dynamic assessment as analternative aptitude testing strategy Unpublished DPhil disserta-tion University of South Africa South Africa

Table 1Dutch British and US studies of correlations between g loadings and gain scores

Reference Test r N Information

Drenth et al (1968) AKIT minus 57 100 Primary-school childrenvan Geffen (1972) GATB minus 45 42 Secondary-school children

minus 21 42Bosch (1973) GATB minus 07 43 Secondary-school childrenSchroots and van Alphen

de Veer (1979)LDT minus 42 96 Pre-school and secondary-school children

Bleichrodt et al (1987) RAKIT 09 49 Pre-school childrenminus 25 51 Primary-school childrenminus 21 49 Primary-school children

van der Doef et al (1989) WISC-R minus 69 22 Primary-school children with learning problemsMulder et al (2004) KAIT minus 23 46 Secondary-school children+young adults

minus 42 25 AdultsKort et al (2005) WISC-III minus 15 42 Primary-school children

minus 26 67 Primary-school childrenminus 46 39 Secondary-school children

Luteijn and Barelds (2005) GIT2 minus 51 44 AdultsKooij et al (2005) WAIS-III minus 63 60 AdultsElliott (1983) BAS minus 65 60 Primary-school childrenWechsler (1967) WPPSI minus 46 50 Pre-school childrenUnited States Department

of Labor (1970)GATB minus 35 156 Office applicants

minus 66 605 Male high school seniorsminus 70 554 Female high school seniorsminus 58 223 Males 1-day intervalminus 41 186 Females 1-day intervalminus 50 202 Males 2-week intervalminus 52 152 Females 2-week intervalminus 67 156 Males 6-week intervalminus 61 168 Females 6-week intervalminus 43 176 Males 13-week interval02 149 Females 13-week interval

minus 62 157 Males 26-week intervalminus 32 136 Females 26-week intervalminus 69 119 Males 1-year intervalminus 31 183 Females 1-year intervalminus 96 118 Males 2-year intervalminus 75 170 Females 2-year intervalminus 75 123 Males 3-year intervalminus 48 183 Females 3-year intervalminus 92 3398 Boys secondary schoolminus 92 3680 Girls secondary schoolminus 91 3348 Boys secondary schoolminus 91 3491 Girls secondary schoolminus 84 3229 Boys secondary schoolminus 87 3395 Girls secondary school

Wechsler (1974) WISC-R minus 48 97 Primary-school childrenminus 66 102 Primary-school childrenminus 21 104 Secondary-school children

Bennett et al (1974) DAT minus 79 92 Boys secondary schoolminus 53 81 Girls secondary schoolminus 29 81 Boys secondary schoolminus 62 100 Girls secondary school

Covin (1977) WISC-R minus 57 30 Primary-school children with learning problemsTuma and Appelbaum (1980) WISC-R minus 08 45 Primary- and secondary-school childrenMatarazzo et al (1980) WAIS minus 10 29 Young malesWechsler (1981) WAIS-R minus 64 71 Adults

minus 48 48 Adults

(continued on next page)

289J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Table 1 (continued)

Reference Test r N Information

McCormick et al (1983) ASVAB minus 73 57 adultsKaufman and Kaufman (1983) K-ABC minus 27 46 Pre-school children

minus 18 36 Pre- and primary-school childrenminus 22 70 Primary-school children

Wechsler (1997) WAIS-III minus 45 100 Young adultsminus 57 102 Adultsminus 51 104 Adults03 88 Adults

Reeve and Lam (2005) EAS minus 34 123 Undergraduate students

In general the g loadings were based on the correlation matrix taken from the manuals containing the testndashretest studies or from the correlation matrixbased on the largest sample size we could find What follows is a list of the sources of the g loading when not taken from the manuals containing thetestndashretest studyvan Geffen (1972) and Bosch (1973) de Wolff and Buiten (1963) see also Johnson te Nijenhuis and Bouchard (in press) Bleichrodt et al (1987) teNijenhuis et al (2004) who used the same data on which the RAKIT manual is based van der Doef Kwint and van der Koppel (1989) DutchWISC-R manual Elliott (1983) Table 98 Age 90ndash911 years US Dept of Labors GATB (1970) Jensen (1985 p 214) using the largestcorrelation matrix in the GATBmanual Wechsler (1974) Covin (1977) and Tuma and Appelbaum (1980) Jensen (1985 p 214 first study) Bennettet al (1974) average of four highly similar correlation matrices Matarazzo et al (1980) Wechslers (1955 p 17) Table 8 for ages 25ndash34McCormick et al (1983) Ree and Carretta (1994) Reeve and Lam (2005) utilize SEM analyses and use item parcels instead of full scale scores tocompute g loadings The average g loading of all the item parcels for a specific subtest was taken as the g loading of that specific subtest

290 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

risk of overcorrection we conservatively chose thevalue of 90 for the correction

9 Results

The results of the studies on the correlation between gloadings and gain scores are shown in Table 1 The tablegives data derived from 64 studies with participantsnumbering a total of 26990 The table gives thereference for the study the cognitive ability test usedthe correlation between g loadings and gain scores thesample size and background information on the study Itis clear that virtually all correlations are negative and thatthe size of the few positive correlations is very small

Table 2 shows the results of the psychometric meta-analysis of the 64 data points It shows (from left toright) the number of correlation coefficients (K) totalsample size (N) the mean observed correlations (r) andtheir standard deviation (SDr) the true correlations onecan expect once artifactual error from unreliability in theg vector and the d vector and range restriction in the gvector has been removed (ρ) and their standarddeviation (SDρ) The next two columns present thepercentage of variance explained by artifactual errors (

Table 2Meta-analysis results for correlations between g loadings and gain scores af

Studies included K N r SD

All 64 26990 minus 80 20All minus 3 outliers 61 26704 minus 81 18

K=number of correlations N=total sample size r=mean observed correlcorrelation ρ=true correlation (observed correlation corrected for unreliabiliVE=percentage of variance accounted for by artifactual errors 95 CI=9

VE) and the 95 credibility interval (95 CI) Thisinterval denotes the values one can expect for ρ in 19out of 20 cases

The large number of data points and the very largesample size indicate that we can have confidence in theoutcomes of this meta-analysis The estimated truecorrelation has a value of minus 95 and 81 of the variancein the observed correlations is explained by artifactualerrors However Hunter and Schmidt (1990) state thatextreme outliers should be left out of the analysesbecause they are most likely the result of errors in thedata They also argue that strong outliers artificiallyinflate the SD of effect sizes and thereby reduce theamount of variance that artifacts can explain We choseto leave out three outliersndashmore than 4 SD below theaverage r and more than 8 SD below ρndashcomprising1 of the research participants This resulted in nochanges in the value of the true correlation a largedecrease in the SD of ρ with 74 and a large increasein the amount of variance explained in the observedcorrelations by artifacts by 22 So when the threeoutliers are excluded artifacts explain virtually all of thevariance in the observed correlations Finally a correc-tion for deviation from perfect construct validity in g

ter corrections for reliability and restriction of range

r ρ SDρ VE 95 CI

minus 95 11 81 minus074 to 116minus 95 03 99 minus091 to 100

ation (sample size weighted) SDr=standard deviation of observedty and range restriction) SDρ=standard deviation of true correlation5 credibility interval

291J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

took place using a conservative value of 90 Thisresulted in a value of minus106 for the final estimated truecorrelation between g loadings and score gainsApplying several corrections in a meta-analysis maylead to correlations that are larger than 100 or minus100 asis the case here Percentages of variance accounted forby artifacts larger than 100 are also not uncommon inpsychometric meta-analysis They also do occur in othermethods of statistical estimation (see Hunter amp Schmidt1990 pp 411ndash414 for a discussion)

10 Discussion

A large-scale meta-analysis of 64 testndashretest studiesshows that after corrections for several artifacts there isan estimated true correlation of minus106 between gloading of tests and score gains and virtually all of thevariance in observed correlations is attributable to theseartifacts As several artifacts explain virtually all thevariance in the effect sizes other dimensions on whichthe studies differ such as age of the test takers testndashretest interval test used average-IQ samples or sampleswith learning problems play no role at all

The estimated true correlation of minus106 is the resultof various corrections for artifacts that attenuate thecorrelations The estimated values of the artifacts mayunderestimate or overestimate the population values ofthe artifacts Therefore estimates of true effect sizesmay overestimate or underestimate the populationvalues of the effect size As a solution to this problemHunter and Schmidt (2004) suggest carrying out severalmeta-analyses on the same construct and taking theaverage estimated effect size of all meta-analyses Thegeneral idea is that meta-analysis is a powerful researchtool but does not give perfect outcomes

A correlation of minus106 falls outside the range ofacceptable values of a correlation but one has to make adistinction between the meta-analytical estimate of thetrue correlation between g and d and the true correlationbetween g and d We interpret the value of minus106 for themeta-analytical estimate as meaning that the truecorrelation between g and d is minus100 A correlation ofminus100 means that there is an inverse relationshipbetween g and score gains So the tests with the highestg loadings show the smallest gains The most straightfor-ward interpretation of this very large negative correlationis that there is no g saturation in testndashretest gain scores

11 The South African learning potential study

In a carefully carried-out study Skuy et al (2002)used a dynamic testing procedure to see whether it

would improve the scores of Black South Africanstudents on Ravens Standard Progressive Matrices(RSPM) The Bantu Education Act of 1954 establisheda discriminatory educational system characterized bypoorly qualified teachers sparsely equipped and fundedschools and generally poor quality Most Black studentsin the sample had not received the same quality ofeducation as White students Black White Indian andColored research participants took the RSPM on twooccasions and in between randomly constitutedexperimental groups were exposed to the MediatedLearning Experience Both the Black South Africangroup and the group consisting of White Indian andColored South Africans improved over their baseline onthe RSPM and the Black group showed greaterimprovement

The value of these cognitive interventions increaseswhen the score gains are transferred to other tests andto external criteria such as school or work achieve-ment Therefore the research participants also tookFeuersteins Representational Stencil Design Test as atransfer measure The subject is presented with astencil of a geometric design and then asked to pointto which stencils need to be used and in whatsequence in order to construct an identical designLike the RSPM the Stencils test also requiresrepresentationalabstract thinking but the training onthe RSPM showed little transfer to it Moreover thecorrelation of the RSPM scores with performance inthe end-of-year psychology examination did notsignificantly improve after mediation Once againthe score gains were empty they did not generalizeSkuy et al go on to ask the question what it is thatwas improved by their interventions Professor Skuymade his data accessible to the present authors so wecould perform additional analyses

12 Sample

The data from Skuy et al (2002) were used with theexception of data from three research participantsbecause their pretest IQ scores were extremely low(more than 3 SDs below the group mean) Ninety-fiveuniversity students in psychology aged 16 to 29 (meanage=20 SD=23 25 males 70 females) participatedin this study They were 66 Black students (20 males 46females) and 29 White (20) Indian (6) and Colored (3)students (5 males 24 females) The mean age of theBlack group was 20 (SD=25) and of the WhiteIndian and Colored group 19 years (SD=1) Subjectswere randomly assigned to the experimental group(n=55) and to the control group (n=40)

292 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

13 Procedure

The students participated in pre- and posttest phaseswith a group intervention in between The study focusedon improvement in scores on the RSPM using the SetVariations II of the Learning Propensity AssessmentDevice as the mediation task Mediation training took3 h and was conducted by three experienced psychol-ogists with the assistance of six postgraduate psychol-ogy students A detailed description is given in Skuy etal (2002)

14 Measures and cognitive intervention

The Ravens Standard Progressive Matrices consistsof 60 items (divided into 5 sets of 12 items) designedto measure the ability to form comparisons to reasonby analogy and to organize spatial information intorelated wholes It has been established as one of thepurest measures of g (Jensen 1998a) Skuy et al(2002) found no evidence for test bias against Blacksin South African education Rushton Skuy and Bons(2004) showed that the Ravens gave comparablepredictive validities for students from various groupsCross-cultural testing research has clearly shown thatunsufficient proficiency in the language of the testcan lead to biased assessments in tests with a strongverbal component However the Ravens is a non-verbal test

The Learning Propensity Assessment Device con-sists of 14 exercises Each exercise contains an initialmediation task Subsequent tasks increase in complex-ity and novelty and aim to assist the learner toachieve mastery over the task The purpose ofmediation is to assist the learner to develop theappropriate cognitive strategies and functions neededfor the successful completion of the task The SetVariations II of the Learning Propensity AssessmentDevice consists of five sets of items which comprisevariations of Sets C D and E of the RSPM test Eachset of variations contains a learning task for thepurpose of initial mediation followed by a series ofprogressively more difficult variations to which theskills learned must be applied Mediation involvesdiscussing with groups how to define the problem tobe solved focus on the task set rules regulateproblem solving behavior and identify the correctsequence of logical steps needed to solve the taskMediation also involves helping the subject todevelop appropriate concepts verbal tools andinsights in relation to the task A detailed descriptionis given in Skuy et al (2002)

15 Statistical analyses

Although the Skuy et al study is among the SouthAfrican learning potential studies with the largestsample size the N is not large We therefore chosebasic statistical analyses

151 Descriptive statistics

Means standard deviations and reliabilities werecomputed for the various groups With regard tomeasures of effect size Hunter and Schmidt (1990 p271) advise choosing estimates of variance with the leasterror Because repeated test takings tend to change thesize of the SD (Ackerman 1987) we chose the SD ofthe pretests for the denominator The correlationbetween scores before and after the training wascomputed to see whether the training had an effect onthe rank order of individuals scores

152 Correlation between score gains and g loadedness

Because our sample was not large and quite specificestimates of g loadedness were taken from Lynn Allikand Irwings (2004) item analysis of RSPM in Estoniausing a large (N=2735) nationally representativesample The same reasoning as in psychometric meta-analysis applies namely that larger samples give betterestimates of g loadings than smaller samples In ahierarchical factor analysis of the items using structuralequations modeling Lynn et al computed g loadings of52 of the 60 items In the present study Pearsoncorrelations were calculated between the g loadings ofthese 52 items and the effect sizes on these items

153 g loadings

The RSPM consists of dichotomous items so wecomputed a correlation matrix of polychoric correlations(Nunnally amp Bernstein 1994) A principal axis factoranalysis was carried out The percentage varianceexplained by the first unrotated factor was taken as anestimate of g loadedness Because sample size waslimited we collapsed the experimental and the controlgroup

154 Correlation between sum scores and score gains

We tested whether individuals with low-g improvedtheir scores more than those with high-g by correlatinggain scores with pretest RSPM scores for each of thefour research groups As gain scores tend to be

Table 3Proportion of sample selecting the correct answer on items of Ravens Standard Progressive Matrices by group

Set A Set B Set C Set D Set E

Item Black Other a Item Black Other Item Black Other Item Black Other Item Black Other

1 100 100 13 100 100 25 100 97 37 100 100 49 74 902 97 100 14 100 100 26 96 100 38 99 100 50 64 903 97 100 15 100 100 27 96 100 39 89 100 51 79 974 100 97 16 91 97 28 86 93 40 92 100 52 56 835 100 100 17 96 97 29 94 97 41 96 100 53 52 836 99 100 18 85 100 30 76 83 42 92 100 54 35 767 94 97 19 77 66 31 88 97 43 77 100 55 42 798 91 93 20 79 97 32 50 79 44 76 93 56 21 699 100 97 21 83 97 33 74 90 45 71 97 57 30 4110 91 97 22 92 100 34 61 79 46 79 93 58 12 4111 83 90 23 80 90 35 53 69 47 29 41 59 02 1712 68 83 24 59 83 36 06 35 48 26 38 60 11 21a Other=White Indian and Colored

293J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

negatively correlated with pretest scores as a function ofunreliability (see Cronbach 1990 Nunnally amp Bern-stein 1994) we corrected the correlations using TuckerDamarin and Messicks (1966) formula 63 Using theformula one adds to each correlation the term (SDpretestSD gain score) (1minus reliability pretest)

16 Results

161 Descriptive statistics

Internal consistencies (Cronbach αs) on the RSPMranged from 76 to 86 for the pre- and posttestsrespectively Table 3 shows the proportion of each of thegroups which selected the correct answer on each of the60 items of the pretest Across the 60 items the order ofthe p values was almost identical for Blacks and WhiteIndianColoreds (r=92 p=00)

Table 4 shows the means and standard deviations forthe total RSPM scores for the four groups along withthe d effect sizes representing the difference betweenpre- and posttest scores (Cohen 1988) First we

Table 4Pre- and posttest mean ravens scores standard deviations and mean effect

Black experimental(n=40)

Black control (n=26)

Pretest Posttest Pretest Postte

Raw scoresM 4378 5010 4546 4835SD 664 531 669 671Percentile 14 41 16 31Effect size 095 043

Percentiles are based on US adult norms see Raven Raven and Courts (2a Other=White Indian and Colored

examined whether there was an effect of race (Blackvs WhiteIndianColored) and group (experimental vscontrol) on the pretest scores There was a significanteffect due to race (F(1 91)=2413 p=00 η2 = 21)but not group (F(1 91)=228 p= 14 η2 = 02) Thismeans that mean pretest scores of Blacks (M=4444 SD=665) were lower than those of WhiteIndianColoreds (M=5141 SD=505) and that mean pretestscores of experimental and control groups werecomparable (M=4553 SD=704 and M=48 SD=67 respectively)

Secondly we investigated the effects of training onthe posttest scores by performing a two-way ANCOVAon the total posttest scores with race and group as factorsand the total pretest scores as the covariate There was asignificant effect for group (F(1 95)=1381 p=00η2 = 13) and for race (F(1 90)=399 p=05 η2 = 04)but not for the two-way interaction of group and race (F(1 90)=028 p= 60 η2 = 00) These results indicatethat the training was equally effective for both the Blackand WhiteIndianColored students Posttest scores ofBlacks (M=4941 SD=591) however remained

sizes for Black and WhiteIndianColored students

Other a experimental(n=15)

Other control (n=14)

st Pretest Posttest Pretest Posttest

5020 5580 5271 5536605 376 345 34341 75 55 68093 077

000) Table SPM13

294 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

significantly lower (F(1 91)=2833 p=00) than thoseof WhitesIndiansColoreds (M=5559 SD=355)Although posttest scores of the experimental group(M=5165 SD=553) were higher than those of thecontrol group (M=508 SD = 665) differencesbetween both groups were nonsignificant (F(1 91)=085 p=36)

The correlation between scores before and after thetraining was 84 (p=00) for the experimental group and90 (p=00) for the control group showing that thetraining had only limited effect on the rank order ofindividuals scores This means that the test strongly butnot perfectly measures the same constructs on bothoccasions

162 Correlation between score gains and g loadedness

We estimated effect sizes for each of the four groups(race by condition) by computing the differencebetween mean pretest scores and posttest scores dividedby the standard deviation of the pretest scores of Blackand WhiteIndianColored students respectivelyFinally we calculated the correlations between effectsizes and the g loadings taken from Lynn et alCorrelations were minus 24 (p=10) for the Black experi-mental group minus 21 (p=20) for the WhiteIndianColored experimental group minus 08 (p=59) for theBlack control group and minus 41 (p=01) for the WhiteIndianColored control group Small sample sizesusually attenuate correlations (Hunter amp Schmidt1990) Collapsing the groups indeed resulted in higheraverage correlations minus 39 for the complete experimen-tal group and minus 26 for the complete control group

163 g loadings

Using the combined experimental and controlgroup a principle axis factor analysis on the pretestand posttest scores respectively resulted in a firstunrotated factor explaining 22 of the variance in thepretest scores and 18 of the variance in the posttestscores These findings suggest that the g loadedness ofthe RSPM decreased substantially after MediatedLearning Experience

164 Correlation between score gains and sum score

Correlating score gains with RSPM total scoresresulted in values of minus 60 (p=00) for the Blackexperimental group minus 18 (p=38) for the Black controlgroup minus 82 (p= 00) for the WhiteIndianColoredexperimental group and minus 48 (p=08) for the White

IndianColored control group After the use of thecorrection formula of Tucker et al (1966) thesecorrelations became minus 39 minus 08 minus 61 and minus 35respectively Overall these correlations show that low-g persons improved their scores more strongly thanhigh-g persons

17 Discussion

Skuy et al (2002) hypothesized that the low-qualityeducation of Blacks in South Africa would lead to anunderestimate of their cognitive abilities by IQ testsGroups of Black and WhiteIndianColored studentstook the Ravens Progressive Matrices twice and inbetween received Feuersteins Mediated LearningExperience The test scores went up substantially in allgroups Evidence for an authentic change in the g factorrequires broad transfer or generalizability across a widevariety of cognitive performance However Skuy et alshow that the gains did not generalize to scores on another highly similar test and to external criteria andwere therefore hollow As the score gains were in somecases quite largendash14 IQ points for the Black experi-mental groupndashthe question becomes what is it thatimproved

The findings show that the correlations betweenscore gains and g loadedness of the items were minus 39 forthe complete experimental group and minus 26 for thecomplete control group However because the gloadings and gain scores are measured at the itemlevel their reliabilities are not high resulting insubstantial attenuation of the correlation between gand d Moreover RSPM does not measure g perfectlyJensen (1998a p 91) estimates its g loading at 83When we estimate the reliability of the g vector at 70and the reliability of the gain score vector at 50corrections for unreliability and deviation from perfectconstruct validity of g only would result in estimatedtrue correlations of respectively minus 80 and minus 53 Thesevalues should be taken as underestimates controllingfor additional artifacts will bring them closer to the verystrong negative correlation found in the meta-analysis

The findings suggest that after training the gloadedness of the test decreased substantially Wefound negative substantial correlations between gainscores and RSPM total scores Table 4 shows that thetotal score variance decreased after training which is inline with low-g subjects increasing more than high-gsubjects Since as a rule high-g individuals profit themost from trainingndashas is reflected in the ubiquitouspositive correlation between IQ scores and trainingperformance (Jensen 1980 Schmidt amp Hunter 1998)ndash

295J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

these findings could be interpreted as an indication thatFeuersteins Mediated Learning Experience is not g-loaded in contrast with regular trainings that are clearlyg-loaded Substantial negative correlations betweengain scores and RSPM total scores are no definite proofof this hypothesis but are in line with it Additionalsubstantiation of our hypothesis that the Feuersteintraining has no or little g loadedness is that Coyle (2006)showed that gain scores loaded virtually zero on the gfactor Moreover Skuy et al reported that the predictivevalidity of their measure did not increase when thesecond Raven score was used The fact that individualswith low-g gained more than those with high-g could beinterpreted as an indication that the Mediated LearningExperience was not g-loaded It should be notedhowever that Feuerstein most likely did not intend hisintervention to be g-loaded He was interested inincreasing the performance of low scorers on bothtests and external criteria

18 General discussion

IQ scores are by far the best general predictor ofsuccess in education job training and work Howeverthere are many ways in which these IQ scores can beincreased for instance by means of retesting orparticipating in a learning potential training programWhat conclusions can be drawn from such score gainsJensens (1998a) hypothesis that the effects of trainingon abilities can be summarized in terms of Carrollsthree-stratum hierarchical factor model was tested in ameta-analysis on testndashretest data using Dutch Britishand American test batteries and with learning potentialdata from South Africa using Ravens ProgressiveMatrices The meta-analysis convincingly shows thattestndashretest score gains are not g-loaded The findingsfrom the learning potential study are clearly in line withthis when the attenuation caused by unreliability andother artifacts is taken into account the correlationbetween g loadings of items and gains on items has avalue that is somewhat comparable to the one found inthe meta-analysis for test batteries The data suggest thatthe g loadedness of item scores decreases after theintervention training Te Nijenhuis et als (2001)finding that practice and coaching reduced the g-loadedness of their test scores strengthens the presentfindings using item scores The findings show that notthe high-g participants increase their scores the mostndashasis common in training situationsndashbut it is the low-gpersons showing the largest increases of their scoresThis suggests that the intervention training is not g-loaded

Our findings fit quite well with the hierarchical modelof intelligence The generalizability of test scores residespredominantly in the g component whereas the test-specific ability component and the narrow abilitycomponent are virtually non-generalizable This is forinstance evidenced by the earlier finding that addingverbal tests to a g score or numerical tests to a g scoreresulted in only a very small incremental validity (Ree ampEarles 1991 Ree et al 1994) Additionally Ericssonand Lehmann (1996) reported immense gains for amemory task focusing on one narrow ability but did notfind any improvement for comparable memory tasksfocusing on another narrow ability As the score gains arenot related to g the generalizable g componentdecreases and since it is not unlikely that the Feuersteintraining itself is not g-loaded it is easy to understand whythe score gains did not generalize to scores on thecognitively loaded Representational Stencil Design TestFor a similar reason the score gains did not generalize tog-loaded external criteria as the correlation of the RSPMscores with performance in the end-of-year psychologyexamination did not significantly improve after media-tion Reeve and Lam (2005) claimed that retesting doesnot change the nature of what is being tested but ourfindings suggest the opposite

19 Limitations of the studies

Our meta-analysis and our analysis of the SouthAfrican study are strongly based on the method ofcorrelated vectors (MCV) and recently it has been shownto have limitations Dolan and Lubke (2001) have shownthat when comparing groups substantial positive vectorcorrelations can still be obtained even when groups differnot only on g but also on factors uncorrelated with gAshton and Lee (2005) show that associations of avariable with non-g sources of variance can produce avector correlation of zero even when the variable isstrongly associated with g They suggest that the gloadings of a subtest are sensitive to the nature of the othersubtest in a battery so that a specific sample of subtestsmay cause a spurious correlation between the vectorsNotwithstanding these limitations studies using MCVcontinue to appear (see for instance Colom Haier ampJung in pressHartmannKruuseampNyborg in press Leeet al 2006) The outcomes of our meta-analysis of a largenumber of studies using the method of correlated vectorsmay make an interesting contribution to the discussion onthe limitations of the method of correlated vectors

A principle of meta-analysis is that the amount ofinformation contained in one individual study is quitemodest Therefore one should carry out an analysis of

296 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

all studies on one topic and correct for artifacts leadingto a strong increase of the amount of information Thefact that our meta-analytical value of r=minus106 isvirtually identical to the theoretically expected correla-tion between g and d of minus100 holds some promise thata psychometric meta-analysis of studies using MCV is apowerful way of reducing some of the limitations ofMCV An alternative methodological approach is tolimit oneself to the rare datasets enabling the use ofstructural equations modeling However from a meta-analytical point of view these studies yield only a quitemodest amount of information

Additional meta-analyses of studies employing MCVare necessary to establish the validity of the combinationof MCV and psychometric meta-analysis Most likelymany would agree that a high positive meta-analyticalcorrelation between measures of g and measures ofanother construct implies that g plays a major role andthat a meta-analytical correlation of minus100 implies that gplays no role However it is not clear what value of themeta-analytical correlation to expect from MCV when gplays only a modest role After the present meta-analysison a construct that clearly has an inverse relationshipwith g it would be informative to carry out meta-analyses of studies on variables that are strongly linkedto g and variables that are modestly linked to g Anexample of the latter would be secular score gainswhich according to Lynns (1990) nutrition theoryshould be modestly g-loaded

The sample sizes in the South African study are notlarge but still larger than those in many other studies oflearning potential where an Nasymp10 is not unusual Theresults of a reanalysis of the many existing studies ondynamic testing could lead to a meta-analysis with alarge combined N The mean posttest score was quitehigh so a ceiling effect may have taken place for theWhiteIndianColored group leading to an underestima-tion of the experimental score gain for this group

Instead of testing the hypothesis with a stronglyunidimensional test such as the RSPM it would be betterto use a multidimensional test Moreover a large samplesize would allow the use of more rigorous data-analyticaltechniques leading to more definitive results Howeverto the best of our knowledge datasets meeting theserequirements do not exist and the Skuy et al study isarguably the best South African learning potential study

20 Score gains as low-quality measuresof motivation

As criterion-related validity is strongly dependent ong te Nijenhuis et als finding of lowered g loadings

after training should result in lowered criterion-relatedvalidity However the empirical findings show theopposite virtually all testndashretest and test preparationstudies on cognitive tests and scholastic aptitude tests thatreported both criterion-related validities demonstratesmall to modest increases in criterion-related validity forthe second or third test score (see Allalouf amp Ben-Shakhar 1998 Bashi 1976 Coyle 2006 HausknechtTrevor amp Farr 2002 Jones 1986 Linn 1977 Olsen ampSchrader 1959 Ortar 1960 Powers 1985 Reeve ampLam 2005) In the carefully designed study by Allaloufand Ben-Shakhar (1998) of a university entrance test theexperimental group received an intensive 40-h testcoaching program while the control group did not Thecriterion-related validity for the retest increased for bothgroups Most importantly the increase was the samemdashitwas not larger for the experimental group

In a little-known but carefully designed large-scalelearning potential study by Resing (1990 see Table423) she compared an experimental group thatreceived a pretest a learning potential training and aposttest against a control group that received only thepretest and the posttest The mean criterion-relatedvalidity of the various second scores was 62 for both theexperimental and the control group Learning potentialtraining did not result in incremental criterion-relatedvalidity over and above the validity resulting fromsimply retesting The findings from both Resing andAllalouf and Ben-Shakhar suggest that cognitiveinterventions do not increase criterion-related validitymore than simple retesting

g and the personality measure conscientiousness havebeen shown to make an excellent combination ofpredictors (Schmidt amp Hunter 1998) Conscientiousnessrepresents among other characteristics persistence a willto achieve and the ability to focus effort on the goal Afield study on test preparation using actual job applicants(Clause Delbridge Schmitt Chan amp Jennings 2001)showed that motivation to perform well on the testcorrelated 25 with test performance One could speculatethat score increases do not reflect a true cognitivecomponent but rather become low-quality measures ofmotivation Further since the increase in validity due toretesting and learning potential training is modest incomparison to the large increase obtainable from the useof personality questionnaires personality testing mightprovide a less expensive and more accurate alternative

21 Effectiveness of various training formats

Components of the mediation training used by Skuyet al (2002) are similar to the test training used in te

297J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Nijenhuis et al (2001) Both the Dutch training and theSouth African training took 3 h but whereas in theDutch training the focus was on two different testformats the South African training dealt only with onetest format The test training by Lloyd and Pidgeon(1961) took even less time namely two half-hoursegments each focusing on one test format The effectsizes in all studies were roughly comparable Thissuggests that the methodologies employed by teNijenhuis et al and Lloyd and Pidgeon were moreefficient than those used by Skuy et al It is possible thatthe components of the mediation training that are notpresent in the other two training formats are not effectivein raising test scores and could therefore be left out Iftrue it might be possible to increase the scores on theRSPM by one SD with a relatively simple 1-h training

22 Generalizability of findings

Can these findings of hollow score gains after testndashretest test practice and Mediated Learning ExperienceTraining be generalized to other studies where training-induced score gains were found Ericsson and Lehmann(1996) reported tremendous score increases afterintensive training on numeric memory tests but thesegains did not generalize in the least to verbal memorytests Such gains on one narrow ability do not generalizeto another narrow ability clustering under the samebroad ability and are therefore hollow Similarly Jensen(1998b) showed that score gains due to adoption werenot on the g factor and were therefore most likelyhollow

Rushton (1999) argued that intergenerational scoregains are not linked to g suggesting the Flynn effectsmay be empty but he was strongly criticized by Flynn(1999 2000) In studies on the Flynn effect score gainsfound in cross-sectional studies are largest on the RSPM(Flynn 1987) It has been suggested by Lynn (1998) thata substantial part of these intergenerational score gainson the RSPM are generalizablendashthey do reflect highergndashbut the remaining part is hollow and should beinterpreted as schooling effects The RSPM does requirethe application of the mathematical principles ofaddition subtraction progression and the distributionof values In the three decades (1950sndash1980s) overwhich these increases in RSPM scores have occurredincreasing proportions of 15- to 18-year-olds haveremained in schools where they have learned mathskills that they have applied to the solution of matricesproblems Our findings could be interpreted as supportfor Lynns hypothesis of the partial hollowness of scoregains on the RSPM Notwithstanding the high g loading

of the sum score of the RSPM it is quite sensitive totestndashretest effects and training effects Some studies onthe Flynn effect (Lynn amp Hampson 1986 Teasdale ampOwen 1989) show that the increase in scores is largelyconcentrated in the lower segments of the IQ distribu-tion Our finding that low scorers show the largest gainsafter training may additionally support the notion that apart of the Flynn effect on the RSPM is hollow FinallyWicherts et als (2004) findings show that in some oftheir datasets the secular score gains are most stronglylinked to broad- narrow- and test-specific abilitiesshowing that an important part of the gains are non-generalizable

Ceci (1991) showed that increased schooling leads tohigher IQ scores but are these gains highly specific orpredominantly generalizable It would be interesting toapply the techniques we used in this study to thefindings from previous intervention studies It may bethat biological interventions (such as diet vitaminsupplements vaccination against infectious disease)rather than psychological or educational interventionsare the most cost-effective method of producing truechanges in g and broad abilities It may be that there is abiological barrier between the first stratum and thesecond stratum that restricts the effects of behavioralinterventions to narrow abilities and test specificities

Acknowledgement

We like to thank Mervyn Skuy for his permission touse his dataset

Thanks to Marieacute de Beer Raegan Murphy WelkoTomic Art Jensen and Frank Schmidt for feedback onprevious versions of this paper

Thanks to Arne Evers Wilma Resing (Dutch TestCommittee) and Andress Kooij (Harcourt) for alsohelping in locating testndashretest studies

References

Ackerman P L (1986) Individual differences in informationprocessing An investigation of intellectual abilities Intelligence10 101minus139

Ackerman P L (1987) Individual differences in skill learning Anintegration of psychometric and information processing skillsPsychological Bulletin 102 3minus27

Allalouf A amp Ben-Shakhar G (1998) The effect of coaching on thepredictive validity of scholastic aptitude tests Journal ofEducational Measurement 35(1) 31minus47

Ashton M C amp Lee K (2005) Problems with the method ofcorrelated vectors Intelligence 33 431minus444

Bashi Y (1976) Verbal and non-verbal abilities of 4th 6th and 8thgrade students in the Arab educational system in Israel JerusalemHebrew University School of Education

298 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Bleichrodt N Resing W C M Drenth P J D amp Zaal J N (1987)Intelligentie-meting bij kinderen Empirische en methodologischeverantwoording van de geReviseerde Amsterdamse Kinder Intelli-gentie Test [Measuring the intelligence of children Empirical andmethodological justification of the Revised Amsterdam ChildrenIntelligence Test] Lisse the Netherlands Swets

Bennett G K Seashore H G ampWesman A G (1974)DifferentialAptitude Tests (5th ed) Manual New York The PsychologicalCorporation

Boeyens J C A (1989) Learning potential An empiricalinvestigation Pretoria South Africa Human Science ResearchCouncil

Bosch F (1973) Inventarisatie beschrijving en onderzoek mbt dewijzigingen van de GATB incl test-hertest onderzoek (NoPz3bRp0120) [Stock-taking description and research concern-ing the modifications of the GATB includes testndashretest study]Utrecht the Netherlands Nederlandse Spoorwegen

Carroll J B (1993) Human cognitive abilities A survey of factoranalysis studies Cambridge University Press

Ceci S J (1991) How much does schooling influence generalintelligence and its cognitive components A reassessment of theevidence Developmental Psychology 27 703minus722

Christian K Bachnan H J amp Morrison F J (2001) Schooling andcognitive development In R J Sternberg amp E L Grigorenko(Eds) Environmental effects on cognitive abilities (pp 287minus335)Mahwah NJ Erlbaum

Clause C S Delbridge K Schmitt N Chan D amp Jennings D(2001) Test preparation activities and employment test perfor-mance Human Performance 14 149minus167

Cohen J (1988) Statistical power analysis for the behavioralsciences Hillsdale Lawrence Erlbaum

Colom R Jung R E amp Haier R J (in press) Finding the g-factor inbrain structure using the method of correlated vectors Intelligence

Covin T A (1977) Stability of the WISC-R for 9-year-olds withlearning difficulties Psychological Reports 40 1297minus1298

Coyle T R (2006) Testndashretest changes on scholastic aptitude tests arenot related to g Intelligence 34 15minus27

Cronbach L J (1990) Essentials of psychological testing New YorkHarperCollins

de Villiers AB (1999) Disadvantaged students academic perfor-mance Analysing the zone proximal developmentUnpublished DPhil thesis University of Cape Town South Africa

de Wolff C J amp Buiten B (1963) Een factoranalyse van viertestbatterijen [A factor analysis of four test batteries] NederlandsTijdschrift Voor Psychologie 18 220minus239

Dolan C V amp Lubke G (2001) Viewing Spearmans hypothesisfrom the perspective of multigroup PCA A comment onSchonemanns criticism Intelligence 29 231minus245

Drenth P J D Petrie J F amp Bleichrodt N (1968) Handleiding bijde Amsterdamse Kinder Intelligentie Test [Manual of theAmsterdam Children Intelligence Test] Amsterdam VrijeUniversiteit

Elliott C D (1983) British Ability Scales Manual 2 TechnicalHandbook Windsor Great-Britain NFER-Nelson

Engelbrecht M (1999) Leerpotensiaal as voorspeller van akademi-ese sukses van universiteitsstudente [Learning potential aspredictor of the academic success of university students]Unpublished D Phil thesis Potchefstroom University forChristian Higher Education South Africa

Ericsson K A amp Lehmann A C (1996) Expert and exceptionalperformance Evidence of maximal adaptation to task constraintsAnnual Review of Psychology 47 273minus305

Evers A amp Lucassen W (1991) Handleiding DAT 83 DifferentieumlleAanleg Testserie [Manual DAT83 Differential Aptitude Testseries] Amsterdam Swets

Fleishman E A amp Hempel W E (1955) The relation betweenabilities and improvement with practice in a visual discriminationreaction task Journal of Experimental Psychology 49 301minus312

Flynn J R (1987) Massive IQ gains in 14 nations What IQ testsreally measure Psychological Bulletin 101 171minus191

Flynn J R (1999) Evidence against Rushton The genetic loading ofWISC-R subtests and the causes of between-group IQ differencesPersonality and Individual Differences 26 373minus379

Flynn J R (2000) IQ gains WISC subtests and fluid g g theory andthe relevance of Spearmans hypothesis to race In G R B JGoode (Ed) The nature of intelligence (pp 202minus227) New YorkWiley

Gaydon VP (1988) Predictors of performance of disadvantagedadolescents on the SowetoAlexandra gifted child programmeUnpublished M Ed dissertation University of the WitwatersrandSouth Africa

Gottfredson L S (1997) Why g matters The complexity of everydaylife Intelligence 24(1) 79minus132

Gottfredson L S (2002) g Highly general and highly practical In RJ Sternberg amp E L Grigorenko (Eds) The general intelligencefactor How general is it (pp 331minus380) Mahwah NJ Erlbaum

Grigorenko E L amp Sternberg R J (1998) Dynamic testing Psy-chological Bulletin 124 75minus111

HaeckW Yeld N Conradie J Robertson N amp Shall A (1997) Adevelopmental approach to mathematics testing for universityadmissions and course placement Educational Studies in Mathe-matics 33 71minus91

Hartmann P Kruuse NHS amp Nyborg H (in press) Testing thecross-racial generality of Spearmans hypothesis in two samplesIntelligence

Hausknecht J P Trevor C O amp Farr J L (2002) Retaking abilitytests in a selection setting Implications for practice effects trainingperformance and turnover Journal of Applied Psychology 87(2)243minus254

Hunter J E amp Schmidt F L (1990) Methods of meta-analysisLondon Sage

Hunter J E amp Schmidt F L (2004) Methods of meta-analysis (2nded) London Sage

Jensen A R (1980) Bias in mental testing London MethuenJensen A R (1985) The nature of the blackndashwhite difference on

various psychometric tests Spearmans hypothesis Behavioraland Brain Sciences 8 193minus263

Jensen A R (1998a) The g factor The science of mental abilityLondon Praeger

Jensen A R (1998b) Adoption data and two g-related hypothesesIntelligence 25 1minus6

Johnson W Bouchard T J Krueger R F Jr McGue M ampGottesman I I (2004) Just one g Consistent results from threetest batteries Intelligence 32 95minus107

Johnson W te Nijenhuis J amp Bouchard TJ Jr (in press)Replication of the hierarchical visual-perceptual-image rotationmodel in de Wolff and Buitens (1963) battery of 46 tests of mentalability Intelligence

Jones R J (1986) A comparison of the predictive validity of theMCAT for coached and uncoached students Journal of MedicalEducation 61 335minus338

Kaufman A S amp Kaufman N L (1983) K-ABC KaufmanAssessment Battery for Children Interpretive manual CirclePines MN AGS

299J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Kooij A P Rolfhus E Wilkins C Yang Z amp Zhu J (2005)WAIS-III Nederlandstalige bewerking Technisch rapport hernor-mering [WAIS-III adoptation in Dutch Technical report renorm-ing] Amsterdam Harcourt

Kort W Schittekatte M Dekker P H Verhaeghe P Compaan EL Bosmans M amp Vermeir G (2005) WISC-IIINL WechslerIntelligence Scale for Children Derde Editie NL Handleiding enverantwoording [The Dutch WISC-III Wechsler Intelligence Scalefor Children Third Edition for the Netherlands Manual andjustification] Amsterdam NIP

Kulik J A Bangert-Drowns R L amp Kulik C C (1984)Effectiveness of coaching for aptitude tests PsychologicalBulletin 95 179minus188

Kulik J A Kulik C C amp Bangert R L (1984) Effects of practiceon aptitude and achievement test scores American EducationalResearch Journal 21 435minus447

Lee K H Choi Y Y Gray J R Cho S H Chae J -H Lee S etal (2006) Neural correlates of superior intelligence Strongerrecruitment of posterior parietal cortex NeuroImage 29(2)578minus586

Linn R L (1977) On the treatment of multiple scores for LawSchool Admission Test repeaters (Report LSAC-77-4) In LawSchool Admission Council Reports of LSAC Sponsored ResearchVolume III 1975-1977 Princeton NJ Law School AdmissionCouncil

Lipson LE (1992) Relationship of static and dynamic measures toscholastic achievement of black pupils Unpublished MEddissertation University of Witwatersrand South Africa

Lloyd F amp Pidgeon D A (1961) An investigation into the effects ofcoaching on non-verbal test material with European Indian andAfrican children British Journal of Educational Psychology 31145minus151

Luteijn F amp Barelds D P H (2005) GIT2 Groninger IntelligentieTest 2 [GIT2 Groningen Intelligence Test 2] Amsterdam Harcourt

Lynn R (1990) The role of nutrition in secular increases inintelligence Personality and Individual Differences 11 273minus285

Lynn R (1998) In support of the nutrition theory In U Neisser(Ed) The rising curve Long-term gains in IQ and relatedmeasures (pp 207minus215) Washington DC American Psycholo-gical Association

Lynn R Allik J amp Irwing P (2004) Sex differences on three factorsidentified in Ravens Standard Progressive Matrices Intelligence32 411minus424

Lynn R amp Hampson S (1986) The rise of national intelligenceEvidence from Britain Japan and the USA Personality andIndividual Differences 7 23minus32

Matarazzo J D Carmody T P amp Jacobs L D (1980) Testndashretestreliability and stability of the WAIS A literature review withimplications for clinical practice Journal of Clinical Neuropsy-chology 2(2) 89minus105

McCormick BK Dunlap WP Kennedy RS amp Jones MB(1983) The effects of practice on the Armed Forces VocationalAptitude Test Battery US Army Research Institute for theBehavioral and Social Sciences Technical Report 602

Mulder J L Dekker R amp Dekker P H (2004) KaufmanIntelligentietest voor adolesecenten en volwassenen (KAIT)Handleiding [Kaufman Intelligence test for adolescents and adults(KAIT) Manual] Leiden the Netherlands PITS

Murphy R (2002) A review of South African research in the fieldof dynamic assessment Unpublished MA dissertation Universityof Pretoria (available online from httpupetdupaczathesisavailableetd-05042002-161239)

Nel A (1997)Die voorspelling van akademiese sukses binne kontekstvan n alternatiewe universiteitstoelatingsbeleid [The predictionof academic success within the context of an alternative policy ofuniversity admission] Unpublished MA dissertation RandAfrikaans University South Africa

Neubauer A C amp Freudenthaler H H (1994) Reaction time in asentence-picture verification test and intelligence Individualstrategies and effects of extended practice Intelligence 19193minus218

Nunnally J C amp Bernstein I H (1994) Psychometric theory(3rd ed) New York McGraw-Hill

Olsen M amp Schrader W B (1959) The use of preliminary and finalScholastic Aptitude Test scores in predicting college grades(College Entrance Examination Board Research and DevelopmentReports and Statistical Reports 59-19 Princeton NJ Educa-tional Testing Service

Ortar G R (1960) Improving test validity by coaching EducationalResearch 2 137minus142

Powers D E (1985) Effects of test preparation on the validity ofGraduate Admission Test Applied Psychological Measurement 9179minus190

Raven J Raven J C amp Court J H (2000) Standard ProgressiveMatrices Raven manual Section 3 Oxford Psychologists Press

Ree M J amp Carretta T R (1994) The correlation of generalcognitive ability and psychomotor tracking tests InternationalJournal of Selection and Assessment 2 209minus216

Ree M J amp Earles A A (1991) Predicting training success Notmuch more than g Personnel Psychology 44 321minus332

Ree M J Earles J A amp Teachout M S (1994) Predicting jobperformance Not much more than g Journal of AppliedPsychology 79 518minus524

Reeve C L amp Lam H (2005) The psychometric paradox of practiceeffects due to retesting Measurement invariance and stable abilityestimates in the face of observed score changes Intelligence 33535minus549

Resing W C M (1990) Intelligentie en leerpotentieel Eenonderzoek naar het leerpotentieel van jonge leerlingen uit hetbasis-en speciaal onderwijs [Intelligence and learning potential Astudy into the learning potential of young students in basic andspecial education] Amsterdam the Netherlands Swets

Rushton J P (1999) Secular gains in IQ are not related to the g factorand inbreeding depressionmdashunlike blackndashwhite differences A replyto Flynn Personality and Individual Differences 26 381minus389

Rushton J P Skuy M amp Bons T A (2004) Construct validity ofRavens Advanced Progressive Matrices for African and non-African engineering students in South Africa InternationalJournal of Selection and Assessment 12(3) 220minus229

Schmidt F L amp Hunter J E (1998) The validity and utility ofselection methods in personnel psychology Practical and theore-tical implications of 85 years of research findings PsychologicalBulletin 124(2) 262minus274

Schmidt F L amp Hunter J E (1999) Theory testing andmeasurement error Intelligence 27(3) 183minus198

Schmidt F L amp Le H (2004) Software for the Hunter-Schmidtmeta-analysis methods University of Iowa Department ofManagement and Organization IOWA City IQ 42242

Schroots J J F amp van Alphen de Veer R J (1979) LDT LeidseDiagnostische Test Deel 1 Handleiding [LDT Leiden DiagnosticTest Part 1 Manual] Lisse the Netherlands Swets

Shochet I M (1986) Manifest and potential performance inadvantaged and disadvantaged students Unpublished DPhildissertation University of the Witwatersrand South Africa

300 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Skuy M Gewer A Osrin Y Khunou D Fridjon P amp RushtonJ P (2002) Effects of mediated learning experience on RavensMatrices scores of African and non-African university studentsin South Africa Intelligence 30 221minus232

Swanson H E amp Lussier C M (2001) A selective synthesis of theexperimental literature on dynamic assessment Review of Educa-tional Research 71 321minus363

Teasdale T W amp Owen D R (1989) Continuing secular increase inintelligence and a stable prevalence of high intelligence levelsIntelligence 13 255minus262

Tuma J M amp Appelbaum A S (1980) Reliability and practiceeffects of WISC-R IQ estimates in a normal population Educa-tional and Psychological Measurement 40 671minus678

te Nijenhuis J Tolboom E Resing W amp Bleichrodt N (2004)Does cultural background influence the intellectual performance ofchildren from immigrant groups Validity of the RAKITintelligence test for immigrant children European Journal ofPsychological Assessment 20 10minus26

te Nijenhuis J amp van der Flier H (1997) Comparability of GATBscores for immigrants and majority group members Some Dutchfindings Journal of Applied Psychology 82 675minus687

te Nijenhuis J Voskuijl O F amp Schijve N B (2001) Practice andcoaching on IQ tests Quite a lot of g International Journal ofSelection and Assessment 9 302minus308

Thorndike R L (1985) The central role of general ability inprediction Multivariate Behavioral Research 20 241minus254

Tucker L R Damarin F amp Messick S (1966) A base-free measureof change Psychometrika 31(4) 457minus473

van der Doef M P Kwint J M amp van der Koppel (1989) Wat lerenmoeilijk lerende kinderen van de WISC-R [What do children whohave difficulties in learning learn from the WISC-R] Kind enAdolescent 10 136minus141

United States Department of Labor (1970) Manual for the USTESGeneral Aptitude Test Battery Section III DevelopmentWashing-ton DC United States Department of Labor

van Geffen (1972) De betrouwbaarheid van de GATB 1002-B opbrugklasniveau [The reliability of the GATB 1002 B for the firstclass at secondary school] Catholic University Nijmegen theNetherlands Psychology of Work and Organisation

van Haasen P P de Bruyn E E J Pijl Y J Poortinga Y H LutjeSpelberg H C Vander Steene G et al (1986) WISC-RWechsler Intelligence Scale for Children-Revised Nederlandsta-lige uitgave [WISC-R Wechsler Intelligence Scale for Children-Revised Dutch edition] Lisse the Netherlands Swets

Wechsler D (1955) Manual for the Wechsler Adult IntelligenceScale New York The Psychological Corporation

Wechsler D (1967)Manual for the Wechsler Preschool and PrimaryScale of Intelligence New York The Psychological Corporation

Wechsler D (1974) Manual for the Wechsler Intelligence Scale forChildren-Revised New York The Psychological Corporation

Wechsler D (1981) WAIS-R manual Wechsler Adult IntelligenceScale-Revised New York The Psychological Corporation

Wechsler D (1997)WAIS-III Wechsler Adult Intelligence Scale-thirdedition and WMS-III Wechsler Memory Scale-third editionTechnical manual New York The Psychological Corporation

Wicherts J W Dolan C V Oosterveld P van Baal G C VBoomsma D I amp Span M M (2004) Are intelligence testsmeasurement invariant over time Investigating the nature of theFlynn effect Intelligence 32(5) 509minus537

Yeld N amp Haeck W (1997) Educational histories and academicpotential Can tests deliver Assessment and Evaluation in HigherEducation 22 5minus16

Zaaiman H (1998) Selecting students for Mathematics and ScienceThe challenge facing higher education in South Africa SouthAfrica Pretoria HSRC Publishers

Zaaiman H van der Flier H amp Thijs G D (2001) Dynamic testingin selection for an educational programme Assessing SouthAfrican performance on the Raven Progressive Matrices Inter-national Journal of Selection and Assessment 9 258minus269

Zolezzi S A (1992) Alternative selection measures for universityundergraduate admissions Unpublished MEd dissertation Uni-versity of the Witwatersrand South Africa

Zolezzi S A (1995) The effectiveness of dynamic assessment as analternative aptitude testing strategy Unpublished DPhil disserta-tion University of South Africa South Africa

Table 1 (continued)

Reference Test r N Information

McCormick et al (1983) ASVAB minus 73 57 adultsKaufman and Kaufman (1983) K-ABC minus 27 46 Pre-school children

minus 18 36 Pre- and primary-school childrenminus 22 70 Primary-school children

Wechsler (1997) WAIS-III minus 45 100 Young adultsminus 57 102 Adultsminus 51 104 Adults03 88 Adults

Reeve and Lam (2005) EAS minus 34 123 Undergraduate students

In general the g loadings were based on the correlation matrix taken from the manuals containing the testndashretest studies or from the correlation matrixbased on the largest sample size we could find What follows is a list of the sources of the g loading when not taken from the manuals containing thetestndashretest studyvan Geffen (1972) and Bosch (1973) de Wolff and Buiten (1963) see also Johnson te Nijenhuis and Bouchard (in press) Bleichrodt et al (1987) teNijenhuis et al (2004) who used the same data on which the RAKIT manual is based van der Doef Kwint and van der Koppel (1989) DutchWISC-R manual Elliott (1983) Table 98 Age 90ndash911 years US Dept of Labors GATB (1970) Jensen (1985 p 214) using the largestcorrelation matrix in the GATBmanual Wechsler (1974) Covin (1977) and Tuma and Appelbaum (1980) Jensen (1985 p 214 first study) Bennettet al (1974) average of four highly similar correlation matrices Matarazzo et al (1980) Wechslers (1955 p 17) Table 8 for ages 25ndash34McCormick et al (1983) Ree and Carretta (1994) Reeve and Lam (2005) utilize SEM analyses and use item parcels instead of full scale scores tocompute g loadings The average g loading of all the item parcels for a specific subtest was taken as the g loading of that specific subtest

290 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

risk of overcorrection we conservatively chose thevalue of 90 for the correction

9 Results

The results of the studies on the correlation between gloadings and gain scores are shown in Table 1 The tablegives data derived from 64 studies with participantsnumbering a total of 26990 The table gives thereference for the study the cognitive ability test usedthe correlation between g loadings and gain scores thesample size and background information on the study Itis clear that virtually all correlations are negative and thatthe size of the few positive correlations is very small

Table 2 shows the results of the psychometric meta-analysis of the 64 data points It shows (from left toright) the number of correlation coefficients (K) totalsample size (N) the mean observed correlations (r) andtheir standard deviation (SDr) the true correlations onecan expect once artifactual error from unreliability in theg vector and the d vector and range restriction in the gvector has been removed (ρ) and their standarddeviation (SDρ) The next two columns present thepercentage of variance explained by artifactual errors (

Table 2Meta-analysis results for correlations between g loadings and gain scores af

Studies included K N r SD

All 64 26990 minus 80 20All minus 3 outliers 61 26704 minus 81 18

K=number of correlations N=total sample size r=mean observed correlcorrelation ρ=true correlation (observed correlation corrected for unreliabiliVE=percentage of variance accounted for by artifactual errors 95 CI=9

VE) and the 95 credibility interval (95 CI) Thisinterval denotes the values one can expect for ρ in 19out of 20 cases

The large number of data points and the very largesample size indicate that we can have confidence in theoutcomes of this meta-analysis The estimated truecorrelation has a value of minus 95 and 81 of the variancein the observed correlations is explained by artifactualerrors However Hunter and Schmidt (1990) state thatextreme outliers should be left out of the analysesbecause they are most likely the result of errors in thedata They also argue that strong outliers artificiallyinflate the SD of effect sizes and thereby reduce theamount of variance that artifacts can explain We choseto leave out three outliersndashmore than 4 SD below theaverage r and more than 8 SD below ρndashcomprising1 of the research participants This resulted in nochanges in the value of the true correlation a largedecrease in the SD of ρ with 74 and a large increasein the amount of variance explained in the observedcorrelations by artifacts by 22 So when the threeoutliers are excluded artifacts explain virtually all of thevariance in the observed correlations Finally a correc-tion for deviation from perfect construct validity in g

ter corrections for reliability and restriction of range

r ρ SDρ VE 95 CI

minus 95 11 81 minus074 to 116minus 95 03 99 minus091 to 100

ation (sample size weighted) SDr=standard deviation of observedty and range restriction) SDρ=standard deviation of true correlation5 credibility interval

291J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

took place using a conservative value of 90 Thisresulted in a value of minus106 for the final estimated truecorrelation between g loadings and score gainsApplying several corrections in a meta-analysis maylead to correlations that are larger than 100 or minus100 asis the case here Percentages of variance accounted forby artifacts larger than 100 are also not uncommon inpsychometric meta-analysis They also do occur in othermethods of statistical estimation (see Hunter amp Schmidt1990 pp 411ndash414 for a discussion)

10 Discussion

A large-scale meta-analysis of 64 testndashretest studiesshows that after corrections for several artifacts there isan estimated true correlation of minus106 between gloading of tests and score gains and virtually all of thevariance in observed correlations is attributable to theseartifacts As several artifacts explain virtually all thevariance in the effect sizes other dimensions on whichthe studies differ such as age of the test takers testndashretest interval test used average-IQ samples or sampleswith learning problems play no role at all

The estimated true correlation of minus106 is the resultof various corrections for artifacts that attenuate thecorrelations The estimated values of the artifacts mayunderestimate or overestimate the population values ofthe artifacts Therefore estimates of true effect sizesmay overestimate or underestimate the populationvalues of the effect size As a solution to this problemHunter and Schmidt (2004) suggest carrying out severalmeta-analyses on the same construct and taking theaverage estimated effect size of all meta-analyses Thegeneral idea is that meta-analysis is a powerful researchtool but does not give perfect outcomes

A correlation of minus106 falls outside the range ofacceptable values of a correlation but one has to make adistinction between the meta-analytical estimate of thetrue correlation between g and d and the true correlationbetween g and d We interpret the value of minus106 for themeta-analytical estimate as meaning that the truecorrelation between g and d is minus100 A correlation ofminus100 means that there is an inverse relationshipbetween g and score gains So the tests with the highestg loadings show the smallest gains The most straightfor-ward interpretation of this very large negative correlationis that there is no g saturation in testndashretest gain scores

11 The South African learning potential study

In a carefully carried-out study Skuy et al (2002)used a dynamic testing procedure to see whether it

would improve the scores of Black South Africanstudents on Ravens Standard Progressive Matrices(RSPM) The Bantu Education Act of 1954 establisheda discriminatory educational system characterized bypoorly qualified teachers sparsely equipped and fundedschools and generally poor quality Most Black studentsin the sample had not received the same quality ofeducation as White students Black White Indian andColored research participants took the RSPM on twooccasions and in between randomly constitutedexperimental groups were exposed to the MediatedLearning Experience Both the Black South Africangroup and the group consisting of White Indian andColored South Africans improved over their baseline onthe RSPM and the Black group showed greaterimprovement

The value of these cognitive interventions increaseswhen the score gains are transferred to other tests andto external criteria such as school or work achieve-ment Therefore the research participants also tookFeuersteins Representational Stencil Design Test as atransfer measure The subject is presented with astencil of a geometric design and then asked to pointto which stencils need to be used and in whatsequence in order to construct an identical designLike the RSPM the Stencils test also requiresrepresentationalabstract thinking but the training onthe RSPM showed little transfer to it Moreover thecorrelation of the RSPM scores with performance inthe end-of-year psychology examination did notsignificantly improve after mediation Once againthe score gains were empty they did not generalizeSkuy et al go on to ask the question what it is thatwas improved by their interventions Professor Skuymade his data accessible to the present authors so wecould perform additional analyses

12 Sample

The data from Skuy et al (2002) were used with theexception of data from three research participantsbecause their pretest IQ scores were extremely low(more than 3 SDs below the group mean) Ninety-fiveuniversity students in psychology aged 16 to 29 (meanage=20 SD=23 25 males 70 females) participatedin this study They were 66 Black students (20 males 46females) and 29 White (20) Indian (6) and Colored (3)students (5 males 24 females) The mean age of theBlack group was 20 (SD=25) and of the WhiteIndian and Colored group 19 years (SD=1) Subjectswere randomly assigned to the experimental group(n=55) and to the control group (n=40)

292 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

13 Procedure

The students participated in pre- and posttest phaseswith a group intervention in between The study focusedon improvement in scores on the RSPM using the SetVariations II of the Learning Propensity AssessmentDevice as the mediation task Mediation training took3 h and was conducted by three experienced psychol-ogists with the assistance of six postgraduate psychol-ogy students A detailed description is given in Skuy etal (2002)

14 Measures and cognitive intervention

The Ravens Standard Progressive Matrices consistsof 60 items (divided into 5 sets of 12 items) designedto measure the ability to form comparisons to reasonby analogy and to organize spatial information intorelated wholes It has been established as one of thepurest measures of g (Jensen 1998a) Skuy et al(2002) found no evidence for test bias against Blacksin South African education Rushton Skuy and Bons(2004) showed that the Ravens gave comparablepredictive validities for students from various groupsCross-cultural testing research has clearly shown thatunsufficient proficiency in the language of the testcan lead to biased assessments in tests with a strongverbal component However the Ravens is a non-verbal test

The Learning Propensity Assessment Device con-sists of 14 exercises Each exercise contains an initialmediation task Subsequent tasks increase in complex-ity and novelty and aim to assist the learner toachieve mastery over the task The purpose ofmediation is to assist the learner to develop theappropriate cognitive strategies and functions neededfor the successful completion of the task The SetVariations II of the Learning Propensity AssessmentDevice consists of five sets of items which comprisevariations of Sets C D and E of the RSPM test Eachset of variations contains a learning task for thepurpose of initial mediation followed by a series ofprogressively more difficult variations to which theskills learned must be applied Mediation involvesdiscussing with groups how to define the problem tobe solved focus on the task set rules regulateproblem solving behavior and identify the correctsequence of logical steps needed to solve the taskMediation also involves helping the subject todevelop appropriate concepts verbal tools andinsights in relation to the task A detailed descriptionis given in Skuy et al (2002)

15 Statistical analyses

Although the Skuy et al study is among the SouthAfrican learning potential studies with the largestsample size the N is not large We therefore chosebasic statistical analyses

151 Descriptive statistics

Means standard deviations and reliabilities werecomputed for the various groups With regard tomeasures of effect size Hunter and Schmidt (1990 p271) advise choosing estimates of variance with the leasterror Because repeated test takings tend to change thesize of the SD (Ackerman 1987) we chose the SD ofthe pretests for the denominator The correlationbetween scores before and after the training wascomputed to see whether the training had an effect onthe rank order of individuals scores

152 Correlation between score gains and g loadedness

Because our sample was not large and quite specificestimates of g loadedness were taken from Lynn Allikand Irwings (2004) item analysis of RSPM in Estoniausing a large (N=2735) nationally representativesample The same reasoning as in psychometric meta-analysis applies namely that larger samples give betterestimates of g loadings than smaller samples In ahierarchical factor analysis of the items using structuralequations modeling Lynn et al computed g loadings of52 of the 60 items In the present study Pearsoncorrelations were calculated between the g loadings ofthese 52 items and the effect sizes on these items

153 g loadings

The RSPM consists of dichotomous items so wecomputed a correlation matrix of polychoric correlations(Nunnally amp Bernstein 1994) A principal axis factoranalysis was carried out The percentage varianceexplained by the first unrotated factor was taken as anestimate of g loadedness Because sample size waslimited we collapsed the experimental and the controlgroup

154 Correlation between sum scores and score gains

We tested whether individuals with low-g improvedtheir scores more than those with high-g by correlatinggain scores with pretest RSPM scores for each of thefour research groups As gain scores tend to be

Table 3Proportion of sample selecting the correct answer on items of Ravens Standard Progressive Matrices by group

Set A Set B Set C Set D Set E

Item Black Other a Item Black Other Item Black Other Item Black Other Item Black Other

1 100 100 13 100 100 25 100 97 37 100 100 49 74 902 97 100 14 100 100 26 96 100 38 99 100 50 64 903 97 100 15 100 100 27 96 100 39 89 100 51 79 974 100 97 16 91 97 28 86 93 40 92 100 52 56 835 100 100 17 96 97 29 94 97 41 96 100 53 52 836 99 100 18 85 100 30 76 83 42 92 100 54 35 767 94 97 19 77 66 31 88 97 43 77 100 55 42 798 91 93 20 79 97 32 50 79 44 76 93 56 21 699 100 97 21 83 97 33 74 90 45 71 97 57 30 4110 91 97 22 92 100 34 61 79 46 79 93 58 12 4111 83 90 23 80 90 35 53 69 47 29 41 59 02 1712 68 83 24 59 83 36 06 35 48 26 38 60 11 21a Other=White Indian and Colored

293J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

negatively correlated with pretest scores as a function ofunreliability (see Cronbach 1990 Nunnally amp Bern-stein 1994) we corrected the correlations using TuckerDamarin and Messicks (1966) formula 63 Using theformula one adds to each correlation the term (SDpretestSD gain score) (1minus reliability pretest)

16 Results

161 Descriptive statistics

Internal consistencies (Cronbach αs) on the RSPMranged from 76 to 86 for the pre- and posttestsrespectively Table 3 shows the proportion of each of thegroups which selected the correct answer on each of the60 items of the pretest Across the 60 items the order ofthe p values was almost identical for Blacks and WhiteIndianColoreds (r=92 p=00)

Table 4 shows the means and standard deviations forthe total RSPM scores for the four groups along withthe d effect sizes representing the difference betweenpre- and posttest scores (Cohen 1988) First we

Table 4Pre- and posttest mean ravens scores standard deviations and mean effect

Black experimental(n=40)

Black control (n=26)

Pretest Posttest Pretest Postte

Raw scoresM 4378 5010 4546 4835SD 664 531 669 671Percentile 14 41 16 31Effect size 095 043

Percentiles are based on US adult norms see Raven Raven and Courts (2a Other=White Indian and Colored

examined whether there was an effect of race (Blackvs WhiteIndianColored) and group (experimental vscontrol) on the pretest scores There was a significanteffect due to race (F(1 91)=2413 p=00 η2 = 21)but not group (F(1 91)=228 p= 14 η2 = 02) Thismeans that mean pretest scores of Blacks (M=4444 SD=665) were lower than those of WhiteIndianColoreds (M=5141 SD=505) and that mean pretestscores of experimental and control groups werecomparable (M=4553 SD=704 and M=48 SD=67 respectively)

Secondly we investigated the effects of training onthe posttest scores by performing a two-way ANCOVAon the total posttest scores with race and group as factorsand the total pretest scores as the covariate There was asignificant effect for group (F(1 95)=1381 p=00η2 = 13) and for race (F(1 90)=399 p=05 η2 = 04)but not for the two-way interaction of group and race (F(1 90)=028 p= 60 η2 = 00) These results indicatethat the training was equally effective for both the Blackand WhiteIndianColored students Posttest scores ofBlacks (M=4941 SD=591) however remained

sizes for Black and WhiteIndianColored students

Other a experimental(n=15)

Other control (n=14)

st Pretest Posttest Pretest Posttest

5020 5580 5271 5536605 376 345 34341 75 55 68093 077

000) Table SPM13

294 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

significantly lower (F(1 91)=2833 p=00) than thoseof WhitesIndiansColoreds (M=5559 SD=355)Although posttest scores of the experimental group(M=5165 SD=553) were higher than those of thecontrol group (M=508 SD = 665) differencesbetween both groups were nonsignificant (F(1 91)=085 p=36)

The correlation between scores before and after thetraining was 84 (p=00) for the experimental group and90 (p=00) for the control group showing that thetraining had only limited effect on the rank order ofindividuals scores This means that the test strongly butnot perfectly measures the same constructs on bothoccasions

162 Correlation between score gains and g loadedness

We estimated effect sizes for each of the four groups(race by condition) by computing the differencebetween mean pretest scores and posttest scores dividedby the standard deviation of the pretest scores of Blackand WhiteIndianColored students respectivelyFinally we calculated the correlations between effectsizes and the g loadings taken from Lynn et alCorrelations were minus 24 (p=10) for the Black experi-mental group minus 21 (p=20) for the WhiteIndianColored experimental group minus 08 (p=59) for theBlack control group and minus 41 (p=01) for the WhiteIndianColored control group Small sample sizesusually attenuate correlations (Hunter amp Schmidt1990) Collapsing the groups indeed resulted in higheraverage correlations minus 39 for the complete experimen-tal group and minus 26 for the complete control group

163 g loadings

Using the combined experimental and controlgroup a principle axis factor analysis on the pretestand posttest scores respectively resulted in a firstunrotated factor explaining 22 of the variance in thepretest scores and 18 of the variance in the posttestscores These findings suggest that the g loadedness ofthe RSPM decreased substantially after MediatedLearning Experience

164 Correlation between score gains and sum score

Correlating score gains with RSPM total scoresresulted in values of minus 60 (p=00) for the Blackexperimental group minus 18 (p=38) for the Black controlgroup minus 82 (p= 00) for the WhiteIndianColoredexperimental group and minus 48 (p=08) for the White

IndianColored control group After the use of thecorrection formula of Tucker et al (1966) thesecorrelations became minus 39 minus 08 minus 61 and minus 35respectively Overall these correlations show that low-g persons improved their scores more strongly thanhigh-g persons

17 Discussion

Skuy et al (2002) hypothesized that the low-qualityeducation of Blacks in South Africa would lead to anunderestimate of their cognitive abilities by IQ testsGroups of Black and WhiteIndianColored studentstook the Ravens Progressive Matrices twice and inbetween received Feuersteins Mediated LearningExperience The test scores went up substantially in allgroups Evidence for an authentic change in the g factorrequires broad transfer or generalizability across a widevariety of cognitive performance However Skuy et alshow that the gains did not generalize to scores on another highly similar test and to external criteria andwere therefore hollow As the score gains were in somecases quite largendash14 IQ points for the Black experi-mental groupndashthe question becomes what is it thatimproved

The findings show that the correlations betweenscore gains and g loadedness of the items were minus 39 forthe complete experimental group and minus 26 for thecomplete control group However because the gloadings and gain scores are measured at the itemlevel their reliabilities are not high resulting insubstantial attenuation of the correlation between gand d Moreover RSPM does not measure g perfectlyJensen (1998a p 91) estimates its g loading at 83When we estimate the reliability of the g vector at 70and the reliability of the gain score vector at 50corrections for unreliability and deviation from perfectconstruct validity of g only would result in estimatedtrue correlations of respectively minus 80 and minus 53 Thesevalues should be taken as underestimates controllingfor additional artifacts will bring them closer to the verystrong negative correlation found in the meta-analysis

The findings suggest that after training the gloadedness of the test decreased substantially Wefound negative substantial correlations between gainscores and RSPM total scores Table 4 shows that thetotal score variance decreased after training which is inline with low-g subjects increasing more than high-gsubjects Since as a rule high-g individuals profit themost from trainingndashas is reflected in the ubiquitouspositive correlation between IQ scores and trainingperformance (Jensen 1980 Schmidt amp Hunter 1998)ndash

295J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

these findings could be interpreted as an indication thatFeuersteins Mediated Learning Experience is not g-loaded in contrast with regular trainings that are clearlyg-loaded Substantial negative correlations betweengain scores and RSPM total scores are no definite proofof this hypothesis but are in line with it Additionalsubstantiation of our hypothesis that the Feuersteintraining has no or little g loadedness is that Coyle (2006)showed that gain scores loaded virtually zero on the gfactor Moreover Skuy et al reported that the predictivevalidity of their measure did not increase when thesecond Raven score was used The fact that individualswith low-g gained more than those with high-g could beinterpreted as an indication that the Mediated LearningExperience was not g-loaded It should be notedhowever that Feuerstein most likely did not intend hisintervention to be g-loaded He was interested inincreasing the performance of low scorers on bothtests and external criteria

18 General discussion

IQ scores are by far the best general predictor ofsuccess in education job training and work Howeverthere are many ways in which these IQ scores can beincreased for instance by means of retesting orparticipating in a learning potential training programWhat conclusions can be drawn from such score gainsJensens (1998a) hypothesis that the effects of trainingon abilities can be summarized in terms of Carrollsthree-stratum hierarchical factor model was tested in ameta-analysis on testndashretest data using Dutch Britishand American test batteries and with learning potentialdata from South Africa using Ravens ProgressiveMatrices The meta-analysis convincingly shows thattestndashretest score gains are not g-loaded The findingsfrom the learning potential study are clearly in line withthis when the attenuation caused by unreliability andother artifacts is taken into account the correlationbetween g loadings of items and gains on items has avalue that is somewhat comparable to the one found inthe meta-analysis for test batteries The data suggest thatthe g loadedness of item scores decreases after theintervention training Te Nijenhuis et als (2001)finding that practice and coaching reduced the g-loadedness of their test scores strengthens the presentfindings using item scores The findings show that notthe high-g participants increase their scores the mostndashasis common in training situationsndashbut it is the low-gpersons showing the largest increases of their scoresThis suggests that the intervention training is not g-loaded

Our findings fit quite well with the hierarchical modelof intelligence The generalizability of test scores residespredominantly in the g component whereas the test-specific ability component and the narrow abilitycomponent are virtually non-generalizable This is forinstance evidenced by the earlier finding that addingverbal tests to a g score or numerical tests to a g scoreresulted in only a very small incremental validity (Ree ampEarles 1991 Ree et al 1994) Additionally Ericssonand Lehmann (1996) reported immense gains for amemory task focusing on one narrow ability but did notfind any improvement for comparable memory tasksfocusing on another narrow ability As the score gains arenot related to g the generalizable g componentdecreases and since it is not unlikely that the Feuersteintraining itself is not g-loaded it is easy to understand whythe score gains did not generalize to scores on thecognitively loaded Representational Stencil Design TestFor a similar reason the score gains did not generalize tog-loaded external criteria as the correlation of the RSPMscores with performance in the end-of-year psychologyexamination did not significantly improve after media-tion Reeve and Lam (2005) claimed that retesting doesnot change the nature of what is being tested but ourfindings suggest the opposite

19 Limitations of the studies

Our meta-analysis and our analysis of the SouthAfrican study are strongly based on the method ofcorrelated vectors (MCV) and recently it has been shownto have limitations Dolan and Lubke (2001) have shownthat when comparing groups substantial positive vectorcorrelations can still be obtained even when groups differnot only on g but also on factors uncorrelated with gAshton and Lee (2005) show that associations of avariable with non-g sources of variance can produce avector correlation of zero even when the variable isstrongly associated with g They suggest that the gloadings of a subtest are sensitive to the nature of the othersubtest in a battery so that a specific sample of subtestsmay cause a spurious correlation between the vectorsNotwithstanding these limitations studies using MCVcontinue to appear (see for instance Colom Haier ampJung in pressHartmannKruuseampNyborg in press Leeet al 2006) The outcomes of our meta-analysis of a largenumber of studies using the method of correlated vectorsmay make an interesting contribution to the discussion onthe limitations of the method of correlated vectors

A principle of meta-analysis is that the amount ofinformation contained in one individual study is quitemodest Therefore one should carry out an analysis of

296 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

all studies on one topic and correct for artifacts leadingto a strong increase of the amount of information Thefact that our meta-analytical value of r=minus106 isvirtually identical to the theoretically expected correla-tion between g and d of minus100 holds some promise thata psychometric meta-analysis of studies using MCV is apowerful way of reducing some of the limitations ofMCV An alternative methodological approach is tolimit oneself to the rare datasets enabling the use ofstructural equations modeling However from a meta-analytical point of view these studies yield only a quitemodest amount of information

Additional meta-analyses of studies employing MCVare necessary to establish the validity of the combinationof MCV and psychometric meta-analysis Most likelymany would agree that a high positive meta-analyticalcorrelation between measures of g and measures ofanother construct implies that g plays a major role andthat a meta-analytical correlation of minus100 implies that gplays no role However it is not clear what value of themeta-analytical correlation to expect from MCV when gplays only a modest role After the present meta-analysison a construct that clearly has an inverse relationshipwith g it would be informative to carry out meta-analyses of studies on variables that are strongly linkedto g and variables that are modestly linked to g Anexample of the latter would be secular score gainswhich according to Lynns (1990) nutrition theoryshould be modestly g-loaded

The sample sizes in the South African study are notlarge but still larger than those in many other studies oflearning potential where an Nasymp10 is not unusual Theresults of a reanalysis of the many existing studies ondynamic testing could lead to a meta-analysis with alarge combined N The mean posttest score was quitehigh so a ceiling effect may have taken place for theWhiteIndianColored group leading to an underestima-tion of the experimental score gain for this group

Instead of testing the hypothesis with a stronglyunidimensional test such as the RSPM it would be betterto use a multidimensional test Moreover a large samplesize would allow the use of more rigorous data-analyticaltechniques leading to more definitive results Howeverto the best of our knowledge datasets meeting theserequirements do not exist and the Skuy et al study isarguably the best South African learning potential study

20 Score gains as low-quality measuresof motivation

As criterion-related validity is strongly dependent ong te Nijenhuis et als finding of lowered g loadings

after training should result in lowered criterion-relatedvalidity However the empirical findings show theopposite virtually all testndashretest and test preparationstudies on cognitive tests and scholastic aptitude tests thatreported both criterion-related validities demonstratesmall to modest increases in criterion-related validity forthe second or third test score (see Allalouf amp Ben-Shakhar 1998 Bashi 1976 Coyle 2006 HausknechtTrevor amp Farr 2002 Jones 1986 Linn 1977 Olsen ampSchrader 1959 Ortar 1960 Powers 1985 Reeve ampLam 2005) In the carefully designed study by Allaloufand Ben-Shakhar (1998) of a university entrance test theexperimental group received an intensive 40-h testcoaching program while the control group did not Thecriterion-related validity for the retest increased for bothgroups Most importantly the increase was the samemdashitwas not larger for the experimental group

In a little-known but carefully designed large-scalelearning potential study by Resing (1990 see Table423) she compared an experimental group thatreceived a pretest a learning potential training and aposttest against a control group that received only thepretest and the posttest The mean criterion-relatedvalidity of the various second scores was 62 for both theexperimental and the control group Learning potentialtraining did not result in incremental criterion-relatedvalidity over and above the validity resulting fromsimply retesting The findings from both Resing andAllalouf and Ben-Shakhar suggest that cognitiveinterventions do not increase criterion-related validitymore than simple retesting

g and the personality measure conscientiousness havebeen shown to make an excellent combination ofpredictors (Schmidt amp Hunter 1998) Conscientiousnessrepresents among other characteristics persistence a willto achieve and the ability to focus effort on the goal Afield study on test preparation using actual job applicants(Clause Delbridge Schmitt Chan amp Jennings 2001)showed that motivation to perform well on the testcorrelated 25 with test performance One could speculatethat score increases do not reflect a true cognitivecomponent but rather become low-quality measures ofmotivation Further since the increase in validity due toretesting and learning potential training is modest incomparison to the large increase obtainable from the useof personality questionnaires personality testing mightprovide a less expensive and more accurate alternative

21 Effectiveness of various training formats

Components of the mediation training used by Skuyet al (2002) are similar to the test training used in te

297J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Nijenhuis et al (2001) Both the Dutch training and theSouth African training took 3 h but whereas in theDutch training the focus was on two different testformats the South African training dealt only with onetest format The test training by Lloyd and Pidgeon(1961) took even less time namely two half-hoursegments each focusing on one test format The effectsizes in all studies were roughly comparable Thissuggests that the methodologies employed by teNijenhuis et al and Lloyd and Pidgeon were moreefficient than those used by Skuy et al It is possible thatthe components of the mediation training that are notpresent in the other two training formats are not effectivein raising test scores and could therefore be left out Iftrue it might be possible to increase the scores on theRSPM by one SD with a relatively simple 1-h training

22 Generalizability of findings

Can these findings of hollow score gains after testndashretest test practice and Mediated Learning ExperienceTraining be generalized to other studies where training-induced score gains were found Ericsson and Lehmann(1996) reported tremendous score increases afterintensive training on numeric memory tests but thesegains did not generalize in the least to verbal memorytests Such gains on one narrow ability do not generalizeto another narrow ability clustering under the samebroad ability and are therefore hollow Similarly Jensen(1998b) showed that score gains due to adoption werenot on the g factor and were therefore most likelyhollow

Rushton (1999) argued that intergenerational scoregains are not linked to g suggesting the Flynn effectsmay be empty but he was strongly criticized by Flynn(1999 2000) In studies on the Flynn effect score gainsfound in cross-sectional studies are largest on the RSPM(Flynn 1987) It has been suggested by Lynn (1998) thata substantial part of these intergenerational score gainson the RSPM are generalizablendashthey do reflect highergndashbut the remaining part is hollow and should beinterpreted as schooling effects The RSPM does requirethe application of the mathematical principles ofaddition subtraction progression and the distributionof values In the three decades (1950sndash1980s) overwhich these increases in RSPM scores have occurredincreasing proportions of 15- to 18-year-olds haveremained in schools where they have learned mathskills that they have applied to the solution of matricesproblems Our findings could be interpreted as supportfor Lynns hypothesis of the partial hollowness of scoregains on the RSPM Notwithstanding the high g loading

of the sum score of the RSPM it is quite sensitive totestndashretest effects and training effects Some studies onthe Flynn effect (Lynn amp Hampson 1986 Teasdale ampOwen 1989) show that the increase in scores is largelyconcentrated in the lower segments of the IQ distribu-tion Our finding that low scorers show the largest gainsafter training may additionally support the notion that apart of the Flynn effect on the RSPM is hollow FinallyWicherts et als (2004) findings show that in some oftheir datasets the secular score gains are most stronglylinked to broad- narrow- and test-specific abilitiesshowing that an important part of the gains are non-generalizable

Ceci (1991) showed that increased schooling leads tohigher IQ scores but are these gains highly specific orpredominantly generalizable It would be interesting toapply the techniques we used in this study to thefindings from previous intervention studies It may bethat biological interventions (such as diet vitaminsupplements vaccination against infectious disease)rather than psychological or educational interventionsare the most cost-effective method of producing truechanges in g and broad abilities It may be that there is abiological barrier between the first stratum and thesecond stratum that restricts the effects of behavioralinterventions to narrow abilities and test specificities

Acknowledgement

We like to thank Mervyn Skuy for his permission touse his dataset

Thanks to Marieacute de Beer Raegan Murphy WelkoTomic Art Jensen and Frank Schmidt for feedback onprevious versions of this paper

Thanks to Arne Evers Wilma Resing (Dutch TestCommittee) and Andress Kooij (Harcourt) for alsohelping in locating testndashretest studies

References

Ackerman P L (1986) Individual differences in informationprocessing An investigation of intellectual abilities Intelligence10 101minus139

Ackerman P L (1987) Individual differences in skill learning Anintegration of psychometric and information processing skillsPsychological Bulletin 102 3minus27

Allalouf A amp Ben-Shakhar G (1998) The effect of coaching on thepredictive validity of scholastic aptitude tests Journal ofEducational Measurement 35(1) 31minus47

Ashton M C amp Lee K (2005) Problems with the method ofcorrelated vectors Intelligence 33 431minus444

Bashi Y (1976) Verbal and non-verbal abilities of 4th 6th and 8thgrade students in the Arab educational system in Israel JerusalemHebrew University School of Education

298 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Bleichrodt N Resing W C M Drenth P J D amp Zaal J N (1987)Intelligentie-meting bij kinderen Empirische en methodologischeverantwoording van de geReviseerde Amsterdamse Kinder Intelli-gentie Test [Measuring the intelligence of children Empirical andmethodological justification of the Revised Amsterdam ChildrenIntelligence Test] Lisse the Netherlands Swets

Bennett G K Seashore H G ampWesman A G (1974)DifferentialAptitude Tests (5th ed) Manual New York The PsychologicalCorporation

Boeyens J C A (1989) Learning potential An empiricalinvestigation Pretoria South Africa Human Science ResearchCouncil

Bosch F (1973) Inventarisatie beschrijving en onderzoek mbt dewijzigingen van de GATB incl test-hertest onderzoek (NoPz3bRp0120) [Stock-taking description and research concern-ing the modifications of the GATB includes testndashretest study]Utrecht the Netherlands Nederlandse Spoorwegen

Carroll J B (1993) Human cognitive abilities A survey of factoranalysis studies Cambridge University Press

Ceci S J (1991) How much does schooling influence generalintelligence and its cognitive components A reassessment of theevidence Developmental Psychology 27 703minus722

Christian K Bachnan H J amp Morrison F J (2001) Schooling andcognitive development In R J Sternberg amp E L Grigorenko(Eds) Environmental effects on cognitive abilities (pp 287minus335)Mahwah NJ Erlbaum

Clause C S Delbridge K Schmitt N Chan D amp Jennings D(2001) Test preparation activities and employment test perfor-mance Human Performance 14 149minus167

Cohen J (1988) Statistical power analysis for the behavioralsciences Hillsdale Lawrence Erlbaum

Colom R Jung R E amp Haier R J (in press) Finding the g-factor inbrain structure using the method of correlated vectors Intelligence

Covin T A (1977) Stability of the WISC-R for 9-year-olds withlearning difficulties Psychological Reports 40 1297minus1298

Coyle T R (2006) Testndashretest changes on scholastic aptitude tests arenot related to g Intelligence 34 15minus27

Cronbach L J (1990) Essentials of psychological testing New YorkHarperCollins

de Villiers AB (1999) Disadvantaged students academic perfor-mance Analysing the zone proximal developmentUnpublished DPhil thesis University of Cape Town South Africa

de Wolff C J amp Buiten B (1963) Een factoranalyse van viertestbatterijen [A factor analysis of four test batteries] NederlandsTijdschrift Voor Psychologie 18 220minus239

Dolan C V amp Lubke G (2001) Viewing Spearmans hypothesisfrom the perspective of multigroup PCA A comment onSchonemanns criticism Intelligence 29 231minus245

Drenth P J D Petrie J F amp Bleichrodt N (1968) Handleiding bijde Amsterdamse Kinder Intelligentie Test [Manual of theAmsterdam Children Intelligence Test] Amsterdam VrijeUniversiteit

Elliott C D (1983) British Ability Scales Manual 2 TechnicalHandbook Windsor Great-Britain NFER-Nelson

Engelbrecht M (1999) Leerpotensiaal as voorspeller van akademi-ese sukses van universiteitsstudente [Learning potential aspredictor of the academic success of university students]Unpublished D Phil thesis Potchefstroom University forChristian Higher Education South Africa

Ericsson K A amp Lehmann A C (1996) Expert and exceptionalperformance Evidence of maximal adaptation to task constraintsAnnual Review of Psychology 47 273minus305

Evers A amp Lucassen W (1991) Handleiding DAT 83 DifferentieumlleAanleg Testserie [Manual DAT83 Differential Aptitude Testseries] Amsterdam Swets

Fleishman E A amp Hempel W E (1955) The relation betweenabilities and improvement with practice in a visual discriminationreaction task Journal of Experimental Psychology 49 301minus312

Flynn J R (1987) Massive IQ gains in 14 nations What IQ testsreally measure Psychological Bulletin 101 171minus191

Flynn J R (1999) Evidence against Rushton The genetic loading ofWISC-R subtests and the causes of between-group IQ differencesPersonality and Individual Differences 26 373minus379

Flynn J R (2000) IQ gains WISC subtests and fluid g g theory andthe relevance of Spearmans hypothesis to race In G R B JGoode (Ed) The nature of intelligence (pp 202minus227) New YorkWiley

Gaydon VP (1988) Predictors of performance of disadvantagedadolescents on the SowetoAlexandra gifted child programmeUnpublished M Ed dissertation University of the WitwatersrandSouth Africa

Gottfredson L S (1997) Why g matters The complexity of everydaylife Intelligence 24(1) 79minus132

Gottfredson L S (2002) g Highly general and highly practical In RJ Sternberg amp E L Grigorenko (Eds) The general intelligencefactor How general is it (pp 331minus380) Mahwah NJ Erlbaum

Grigorenko E L amp Sternberg R J (1998) Dynamic testing Psy-chological Bulletin 124 75minus111

HaeckW Yeld N Conradie J Robertson N amp Shall A (1997) Adevelopmental approach to mathematics testing for universityadmissions and course placement Educational Studies in Mathe-matics 33 71minus91

Hartmann P Kruuse NHS amp Nyborg H (in press) Testing thecross-racial generality of Spearmans hypothesis in two samplesIntelligence

Hausknecht J P Trevor C O amp Farr J L (2002) Retaking abilitytests in a selection setting Implications for practice effects trainingperformance and turnover Journal of Applied Psychology 87(2)243minus254

Hunter J E amp Schmidt F L (1990) Methods of meta-analysisLondon Sage

Hunter J E amp Schmidt F L (2004) Methods of meta-analysis (2nded) London Sage

Jensen A R (1980) Bias in mental testing London MethuenJensen A R (1985) The nature of the blackndashwhite difference on

various psychometric tests Spearmans hypothesis Behavioraland Brain Sciences 8 193minus263

Jensen A R (1998a) The g factor The science of mental abilityLondon Praeger

Jensen A R (1998b) Adoption data and two g-related hypothesesIntelligence 25 1minus6

Johnson W Bouchard T J Krueger R F Jr McGue M ampGottesman I I (2004) Just one g Consistent results from threetest batteries Intelligence 32 95minus107

Johnson W te Nijenhuis J amp Bouchard TJ Jr (in press)Replication of the hierarchical visual-perceptual-image rotationmodel in de Wolff and Buitens (1963) battery of 46 tests of mentalability Intelligence

Jones R J (1986) A comparison of the predictive validity of theMCAT for coached and uncoached students Journal of MedicalEducation 61 335minus338

Kaufman A S amp Kaufman N L (1983) K-ABC KaufmanAssessment Battery for Children Interpretive manual CirclePines MN AGS

299J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Kooij A P Rolfhus E Wilkins C Yang Z amp Zhu J (2005)WAIS-III Nederlandstalige bewerking Technisch rapport hernor-mering [WAIS-III adoptation in Dutch Technical report renorm-ing] Amsterdam Harcourt

Kort W Schittekatte M Dekker P H Verhaeghe P Compaan EL Bosmans M amp Vermeir G (2005) WISC-IIINL WechslerIntelligence Scale for Children Derde Editie NL Handleiding enverantwoording [The Dutch WISC-III Wechsler Intelligence Scalefor Children Third Edition for the Netherlands Manual andjustification] Amsterdam NIP

Kulik J A Bangert-Drowns R L amp Kulik C C (1984)Effectiveness of coaching for aptitude tests PsychologicalBulletin 95 179minus188

Kulik J A Kulik C C amp Bangert R L (1984) Effects of practiceon aptitude and achievement test scores American EducationalResearch Journal 21 435minus447

Lee K H Choi Y Y Gray J R Cho S H Chae J -H Lee S etal (2006) Neural correlates of superior intelligence Strongerrecruitment of posterior parietal cortex NeuroImage 29(2)578minus586

Linn R L (1977) On the treatment of multiple scores for LawSchool Admission Test repeaters (Report LSAC-77-4) In LawSchool Admission Council Reports of LSAC Sponsored ResearchVolume III 1975-1977 Princeton NJ Law School AdmissionCouncil

Lipson LE (1992) Relationship of static and dynamic measures toscholastic achievement of black pupils Unpublished MEddissertation University of Witwatersrand South Africa

Lloyd F amp Pidgeon D A (1961) An investigation into the effects ofcoaching on non-verbal test material with European Indian andAfrican children British Journal of Educational Psychology 31145minus151

Luteijn F amp Barelds D P H (2005) GIT2 Groninger IntelligentieTest 2 [GIT2 Groningen Intelligence Test 2] Amsterdam Harcourt

Lynn R (1990) The role of nutrition in secular increases inintelligence Personality and Individual Differences 11 273minus285

Lynn R (1998) In support of the nutrition theory In U Neisser(Ed) The rising curve Long-term gains in IQ and relatedmeasures (pp 207minus215) Washington DC American Psycholo-gical Association

Lynn R Allik J amp Irwing P (2004) Sex differences on three factorsidentified in Ravens Standard Progressive Matrices Intelligence32 411minus424

Lynn R amp Hampson S (1986) The rise of national intelligenceEvidence from Britain Japan and the USA Personality andIndividual Differences 7 23minus32

Matarazzo J D Carmody T P amp Jacobs L D (1980) Testndashretestreliability and stability of the WAIS A literature review withimplications for clinical practice Journal of Clinical Neuropsy-chology 2(2) 89minus105

McCormick BK Dunlap WP Kennedy RS amp Jones MB(1983) The effects of practice on the Armed Forces VocationalAptitude Test Battery US Army Research Institute for theBehavioral and Social Sciences Technical Report 602

Mulder J L Dekker R amp Dekker P H (2004) KaufmanIntelligentietest voor adolesecenten en volwassenen (KAIT)Handleiding [Kaufman Intelligence test for adolescents and adults(KAIT) Manual] Leiden the Netherlands PITS

Murphy R (2002) A review of South African research in the fieldof dynamic assessment Unpublished MA dissertation Universityof Pretoria (available online from httpupetdupaczathesisavailableetd-05042002-161239)

Nel A (1997)Die voorspelling van akademiese sukses binne kontekstvan n alternatiewe universiteitstoelatingsbeleid [The predictionof academic success within the context of an alternative policy ofuniversity admission] Unpublished MA dissertation RandAfrikaans University South Africa

Neubauer A C amp Freudenthaler H H (1994) Reaction time in asentence-picture verification test and intelligence Individualstrategies and effects of extended practice Intelligence 19193minus218

Nunnally J C amp Bernstein I H (1994) Psychometric theory(3rd ed) New York McGraw-Hill

Olsen M amp Schrader W B (1959) The use of preliminary and finalScholastic Aptitude Test scores in predicting college grades(College Entrance Examination Board Research and DevelopmentReports and Statistical Reports 59-19 Princeton NJ Educa-tional Testing Service

Ortar G R (1960) Improving test validity by coaching EducationalResearch 2 137minus142

Powers D E (1985) Effects of test preparation on the validity ofGraduate Admission Test Applied Psychological Measurement 9179minus190

Raven J Raven J C amp Court J H (2000) Standard ProgressiveMatrices Raven manual Section 3 Oxford Psychologists Press

Ree M J amp Carretta T R (1994) The correlation of generalcognitive ability and psychomotor tracking tests InternationalJournal of Selection and Assessment 2 209minus216

Ree M J amp Earles A A (1991) Predicting training success Notmuch more than g Personnel Psychology 44 321minus332

Ree M J Earles J A amp Teachout M S (1994) Predicting jobperformance Not much more than g Journal of AppliedPsychology 79 518minus524

Reeve C L amp Lam H (2005) The psychometric paradox of practiceeffects due to retesting Measurement invariance and stable abilityestimates in the face of observed score changes Intelligence 33535minus549

Resing W C M (1990) Intelligentie en leerpotentieel Eenonderzoek naar het leerpotentieel van jonge leerlingen uit hetbasis-en speciaal onderwijs [Intelligence and learning potential Astudy into the learning potential of young students in basic andspecial education] Amsterdam the Netherlands Swets

Rushton J P (1999) Secular gains in IQ are not related to the g factorand inbreeding depressionmdashunlike blackndashwhite differences A replyto Flynn Personality and Individual Differences 26 381minus389

Rushton J P Skuy M amp Bons T A (2004) Construct validity ofRavens Advanced Progressive Matrices for African and non-African engineering students in South Africa InternationalJournal of Selection and Assessment 12(3) 220minus229

Schmidt F L amp Hunter J E (1998) The validity and utility ofselection methods in personnel psychology Practical and theore-tical implications of 85 years of research findings PsychologicalBulletin 124(2) 262minus274

Schmidt F L amp Hunter J E (1999) Theory testing andmeasurement error Intelligence 27(3) 183minus198

Schmidt F L amp Le H (2004) Software for the Hunter-Schmidtmeta-analysis methods University of Iowa Department ofManagement and Organization IOWA City IQ 42242

Schroots J J F amp van Alphen de Veer R J (1979) LDT LeidseDiagnostische Test Deel 1 Handleiding [LDT Leiden DiagnosticTest Part 1 Manual] Lisse the Netherlands Swets

Shochet I M (1986) Manifest and potential performance inadvantaged and disadvantaged students Unpublished DPhildissertation University of the Witwatersrand South Africa

300 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Skuy M Gewer A Osrin Y Khunou D Fridjon P amp RushtonJ P (2002) Effects of mediated learning experience on RavensMatrices scores of African and non-African university studentsin South Africa Intelligence 30 221minus232

Swanson H E amp Lussier C M (2001) A selective synthesis of theexperimental literature on dynamic assessment Review of Educa-tional Research 71 321minus363

Teasdale T W amp Owen D R (1989) Continuing secular increase inintelligence and a stable prevalence of high intelligence levelsIntelligence 13 255minus262

Tuma J M amp Appelbaum A S (1980) Reliability and practiceeffects of WISC-R IQ estimates in a normal population Educa-tional and Psychological Measurement 40 671minus678

te Nijenhuis J Tolboom E Resing W amp Bleichrodt N (2004)Does cultural background influence the intellectual performance ofchildren from immigrant groups Validity of the RAKITintelligence test for immigrant children European Journal ofPsychological Assessment 20 10minus26

te Nijenhuis J amp van der Flier H (1997) Comparability of GATBscores for immigrants and majority group members Some Dutchfindings Journal of Applied Psychology 82 675minus687

te Nijenhuis J Voskuijl O F amp Schijve N B (2001) Practice andcoaching on IQ tests Quite a lot of g International Journal ofSelection and Assessment 9 302minus308

Thorndike R L (1985) The central role of general ability inprediction Multivariate Behavioral Research 20 241minus254

Tucker L R Damarin F amp Messick S (1966) A base-free measureof change Psychometrika 31(4) 457minus473

van der Doef M P Kwint J M amp van der Koppel (1989) Wat lerenmoeilijk lerende kinderen van de WISC-R [What do children whohave difficulties in learning learn from the WISC-R] Kind enAdolescent 10 136minus141

United States Department of Labor (1970) Manual for the USTESGeneral Aptitude Test Battery Section III DevelopmentWashing-ton DC United States Department of Labor

van Geffen (1972) De betrouwbaarheid van de GATB 1002-B opbrugklasniveau [The reliability of the GATB 1002 B for the firstclass at secondary school] Catholic University Nijmegen theNetherlands Psychology of Work and Organisation

van Haasen P P de Bruyn E E J Pijl Y J Poortinga Y H LutjeSpelberg H C Vander Steene G et al (1986) WISC-RWechsler Intelligence Scale for Children-Revised Nederlandsta-lige uitgave [WISC-R Wechsler Intelligence Scale for Children-Revised Dutch edition] Lisse the Netherlands Swets

Wechsler D (1955) Manual for the Wechsler Adult IntelligenceScale New York The Psychological Corporation

Wechsler D (1967)Manual for the Wechsler Preschool and PrimaryScale of Intelligence New York The Psychological Corporation

Wechsler D (1974) Manual for the Wechsler Intelligence Scale forChildren-Revised New York The Psychological Corporation

Wechsler D (1981) WAIS-R manual Wechsler Adult IntelligenceScale-Revised New York The Psychological Corporation

Wechsler D (1997)WAIS-III Wechsler Adult Intelligence Scale-thirdedition and WMS-III Wechsler Memory Scale-third editionTechnical manual New York The Psychological Corporation

Wicherts J W Dolan C V Oosterveld P van Baal G C VBoomsma D I amp Span M M (2004) Are intelligence testsmeasurement invariant over time Investigating the nature of theFlynn effect Intelligence 32(5) 509minus537

Yeld N amp Haeck W (1997) Educational histories and academicpotential Can tests deliver Assessment and Evaluation in HigherEducation 22 5minus16

Zaaiman H (1998) Selecting students for Mathematics and ScienceThe challenge facing higher education in South Africa SouthAfrica Pretoria HSRC Publishers

Zaaiman H van der Flier H amp Thijs G D (2001) Dynamic testingin selection for an educational programme Assessing SouthAfrican performance on the Raven Progressive Matrices Inter-national Journal of Selection and Assessment 9 258minus269

Zolezzi S A (1992) Alternative selection measures for universityundergraduate admissions Unpublished MEd dissertation Uni-versity of the Witwatersrand South Africa

Zolezzi S A (1995) The effectiveness of dynamic assessment as analternative aptitude testing strategy Unpublished DPhil disserta-tion University of South Africa South Africa

291J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

took place using a conservative value of 90 Thisresulted in a value of minus106 for the final estimated truecorrelation between g loadings and score gainsApplying several corrections in a meta-analysis maylead to correlations that are larger than 100 or minus100 asis the case here Percentages of variance accounted forby artifacts larger than 100 are also not uncommon inpsychometric meta-analysis They also do occur in othermethods of statistical estimation (see Hunter amp Schmidt1990 pp 411ndash414 for a discussion)

10 Discussion

A large-scale meta-analysis of 64 testndashretest studiesshows that after corrections for several artifacts there isan estimated true correlation of minus106 between gloading of tests and score gains and virtually all of thevariance in observed correlations is attributable to theseartifacts As several artifacts explain virtually all thevariance in the effect sizes other dimensions on whichthe studies differ such as age of the test takers testndashretest interval test used average-IQ samples or sampleswith learning problems play no role at all

The estimated true correlation of minus106 is the resultof various corrections for artifacts that attenuate thecorrelations The estimated values of the artifacts mayunderestimate or overestimate the population values ofthe artifacts Therefore estimates of true effect sizesmay overestimate or underestimate the populationvalues of the effect size As a solution to this problemHunter and Schmidt (2004) suggest carrying out severalmeta-analyses on the same construct and taking theaverage estimated effect size of all meta-analyses Thegeneral idea is that meta-analysis is a powerful researchtool but does not give perfect outcomes

A correlation of minus106 falls outside the range ofacceptable values of a correlation but one has to make adistinction between the meta-analytical estimate of thetrue correlation between g and d and the true correlationbetween g and d We interpret the value of minus106 for themeta-analytical estimate as meaning that the truecorrelation between g and d is minus100 A correlation ofminus100 means that there is an inverse relationshipbetween g and score gains So the tests with the highestg loadings show the smallest gains The most straightfor-ward interpretation of this very large negative correlationis that there is no g saturation in testndashretest gain scores

11 The South African learning potential study

In a carefully carried-out study Skuy et al (2002)used a dynamic testing procedure to see whether it

would improve the scores of Black South Africanstudents on Ravens Standard Progressive Matrices(RSPM) The Bantu Education Act of 1954 establisheda discriminatory educational system characterized bypoorly qualified teachers sparsely equipped and fundedschools and generally poor quality Most Black studentsin the sample had not received the same quality ofeducation as White students Black White Indian andColored research participants took the RSPM on twooccasions and in between randomly constitutedexperimental groups were exposed to the MediatedLearning Experience Both the Black South Africangroup and the group consisting of White Indian andColored South Africans improved over their baseline onthe RSPM and the Black group showed greaterimprovement

The value of these cognitive interventions increaseswhen the score gains are transferred to other tests andto external criteria such as school or work achieve-ment Therefore the research participants also tookFeuersteins Representational Stencil Design Test as atransfer measure The subject is presented with astencil of a geometric design and then asked to pointto which stencils need to be used and in whatsequence in order to construct an identical designLike the RSPM the Stencils test also requiresrepresentationalabstract thinking but the training onthe RSPM showed little transfer to it Moreover thecorrelation of the RSPM scores with performance inthe end-of-year psychology examination did notsignificantly improve after mediation Once againthe score gains were empty they did not generalizeSkuy et al go on to ask the question what it is thatwas improved by their interventions Professor Skuymade his data accessible to the present authors so wecould perform additional analyses

12 Sample

The data from Skuy et al (2002) were used with theexception of data from three research participantsbecause their pretest IQ scores were extremely low(more than 3 SDs below the group mean) Ninety-fiveuniversity students in psychology aged 16 to 29 (meanage=20 SD=23 25 males 70 females) participatedin this study They were 66 Black students (20 males 46females) and 29 White (20) Indian (6) and Colored (3)students (5 males 24 females) The mean age of theBlack group was 20 (SD=25) and of the WhiteIndian and Colored group 19 years (SD=1) Subjectswere randomly assigned to the experimental group(n=55) and to the control group (n=40)

292 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

13 Procedure

The students participated in pre- and posttest phaseswith a group intervention in between The study focusedon improvement in scores on the RSPM using the SetVariations II of the Learning Propensity AssessmentDevice as the mediation task Mediation training took3 h and was conducted by three experienced psychol-ogists with the assistance of six postgraduate psychol-ogy students A detailed description is given in Skuy etal (2002)

14 Measures and cognitive intervention

The Ravens Standard Progressive Matrices consistsof 60 items (divided into 5 sets of 12 items) designedto measure the ability to form comparisons to reasonby analogy and to organize spatial information intorelated wholes It has been established as one of thepurest measures of g (Jensen 1998a) Skuy et al(2002) found no evidence for test bias against Blacksin South African education Rushton Skuy and Bons(2004) showed that the Ravens gave comparablepredictive validities for students from various groupsCross-cultural testing research has clearly shown thatunsufficient proficiency in the language of the testcan lead to biased assessments in tests with a strongverbal component However the Ravens is a non-verbal test

The Learning Propensity Assessment Device con-sists of 14 exercises Each exercise contains an initialmediation task Subsequent tasks increase in complex-ity and novelty and aim to assist the learner toachieve mastery over the task The purpose ofmediation is to assist the learner to develop theappropriate cognitive strategies and functions neededfor the successful completion of the task The SetVariations II of the Learning Propensity AssessmentDevice consists of five sets of items which comprisevariations of Sets C D and E of the RSPM test Eachset of variations contains a learning task for thepurpose of initial mediation followed by a series ofprogressively more difficult variations to which theskills learned must be applied Mediation involvesdiscussing with groups how to define the problem tobe solved focus on the task set rules regulateproblem solving behavior and identify the correctsequence of logical steps needed to solve the taskMediation also involves helping the subject todevelop appropriate concepts verbal tools andinsights in relation to the task A detailed descriptionis given in Skuy et al (2002)

15 Statistical analyses

Although the Skuy et al study is among the SouthAfrican learning potential studies with the largestsample size the N is not large We therefore chosebasic statistical analyses

151 Descriptive statistics

Means standard deviations and reliabilities werecomputed for the various groups With regard tomeasures of effect size Hunter and Schmidt (1990 p271) advise choosing estimates of variance with the leasterror Because repeated test takings tend to change thesize of the SD (Ackerman 1987) we chose the SD ofthe pretests for the denominator The correlationbetween scores before and after the training wascomputed to see whether the training had an effect onthe rank order of individuals scores

152 Correlation between score gains and g loadedness

Because our sample was not large and quite specificestimates of g loadedness were taken from Lynn Allikand Irwings (2004) item analysis of RSPM in Estoniausing a large (N=2735) nationally representativesample The same reasoning as in psychometric meta-analysis applies namely that larger samples give betterestimates of g loadings than smaller samples In ahierarchical factor analysis of the items using structuralequations modeling Lynn et al computed g loadings of52 of the 60 items In the present study Pearsoncorrelations were calculated between the g loadings ofthese 52 items and the effect sizes on these items

153 g loadings

The RSPM consists of dichotomous items so wecomputed a correlation matrix of polychoric correlations(Nunnally amp Bernstein 1994) A principal axis factoranalysis was carried out The percentage varianceexplained by the first unrotated factor was taken as anestimate of g loadedness Because sample size waslimited we collapsed the experimental and the controlgroup

154 Correlation between sum scores and score gains

We tested whether individuals with low-g improvedtheir scores more than those with high-g by correlatinggain scores with pretest RSPM scores for each of thefour research groups As gain scores tend to be

Table 3Proportion of sample selecting the correct answer on items of Ravens Standard Progressive Matrices by group

Set A Set B Set C Set D Set E

Item Black Other a Item Black Other Item Black Other Item Black Other Item Black Other

1 100 100 13 100 100 25 100 97 37 100 100 49 74 902 97 100 14 100 100 26 96 100 38 99 100 50 64 903 97 100 15 100 100 27 96 100 39 89 100 51 79 974 100 97 16 91 97 28 86 93 40 92 100 52 56 835 100 100 17 96 97 29 94 97 41 96 100 53 52 836 99 100 18 85 100 30 76 83 42 92 100 54 35 767 94 97 19 77 66 31 88 97 43 77 100 55 42 798 91 93 20 79 97 32 50 79 44 76 93 56 21 699 100 97 21 83 97 33 74 90 45 71 97 57 30 4110 91 97 22 92 100 34 61 79 46 79 93 58 12 4111 83 90 23 80 90 35 53 69 47 29 41 59 02 1712 68 83 24 59 83 36 06 35 48 26 38 60 11 21a Other=White Indian and Colored

293J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

negatively correlated with pretest scores as a function ofunreliability (see Cronbach 1990 Nunnally amp Bern-stein 1994) we corrected the correlations using TuckerDamarin and Messicks (1966) formula 63 Using theformula one adds to each correlation the term (SDpretestSD gain score) (1minus reliability pretest)

16 Results

161 Descriptive statistics

Internal consistencies (Cronbach αs) on the RSPMranged from 76 to 86 for the pre- and posttestsrespectively Table 3 shows the proportion of each of thegroups which selected the correct answer on each of the60 items of the pretest Across the 60 items the order ofthe p values was almost identical for Blacks and WhiteIndianColoreds (r=92 p=00)

Table 4 shows the means and standard deviations forthe total RSPM scores for the four groups along withthe d effect sizes representing the difference betweenpre- and posttest scores (Cohen 1988) First we

Table 4Pre- and posttest mean ravens scores standard deviations and mean effect

Black experimental(n=40)

Black control (n=26)

Pretest Posttest Pretest Postte

Raw scoresM 4378 5010 4546 4835SD 664 531 669 671Percentile 14 41 16 31Effect size 095 043

Percentiles are based on US adult norms see Raven Raven and Courts (2a Other=White Indian and Colored

examined whether there was an effect of race (Blackvs WhiteIndianColored) and group (experimental vscontrol) on the pretest scores There was a significanteffect due to race (F(1 91)=2413 p=00 η2 = 21)but not group (F(1 91)=228 p= 14 η2 = 02) Thismeans that mean pretest scores of Blacks (M=4444 SD=665) were lower than those of WhiteIndianColoreds (M=5141 SD=505) and that mean pretestscores of experimental and control groups werecomparable (M=4553 SD=704 and M=48 SD=67 respectively)

Secondly we investigated the effects of training onthe posttest scores by performing a two-way ANCOVAon the total posttest scores with race and group as factorsand the total pretest scores as the covariate There was asignificant effect for group (F(1 95)=1381 p=00η2 = 13) and for race (F(1 90)=399 p=05 η2 = 04)but not for the two-way interaction of group and race (F(1 90)=028 p= 60 η2 = 00) These results indicatethat the training was equally effective for both the Blackand WhiteIndianColored students Posttest scores ofBlacks (M=4941 SD=591) however remained

sizes for Black and WhiteIndianColored students

Other a experimental(n=15)

Other control (n=14)

st Pretest Posttest Pretest Posttest

5020 5580 5271 5536605 376 345 34341 75 55 68093 077

000) Table SPM13

294 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

significantly lower (F(1 91)=2833 p=00) than thoseof WhitesIndiansColoreds (M=5559 SD=355)Although posttest scores of the experimental group(M=5165 SD=553) were higher than those of thecontrol group (M=508 SD = 665) differencesbetween both groups were nonsignificant (F(1 91)=085 p=36)

The correlation between scores before and after thetraining was 84 (p=00) for the experimental group and90 (p=00) for the control group showing that thetraining had only limited effect on the rank order ofindividuals scores This means that the test strongly butnot perfectly measures the same constructs on bothoccasions

162 Correlation between score gains and g loadedness

We estimated effect sizes for each of the four groups(race by condition) by computing the differencebetween mean pretest scores and posttest scores dividedby the standard deviation of the pretest scores of Blackand WhiteIndianColored students respectivelyFinally we calculated the correlations between effectsizes and the g loadings taken from Lynn et alCorrelations were minus 24 (p=10) for the Black experi-mental group minus 21 (p=20) for the WhiteIndianColored experimental group minus 08 (p=59) for theBlack control group and minus 41 (p=01) for the WhiteIndianColored control group Small sample sizesusually attenuate correlations (Hunter amp Schmidt1990) Collapsing the groups indeed resulted in higheraverage correlations minus 39 for the complete experimen-tal group and minus 26 for the complete control group

163 g loadings

Using the combined experimental and controlgroup a principle axis factor analysis on the pretestand posttest scores respectively resulted in a firstunrotated factor explaining 22 of the variance in thepretest scores and 18 of the variance in the posttestscores These findings suggest that the g loadedness ofthe RSPM decreased substantially after MediatedLearning Experience

164 Correlation between score gains and sum score

Correlating score gains with RSPM total scoresresulted in values of minus 60 (p=00) for the Blackexperimental group minus 18 (p=38) for the Black controlgroup minus 82 (p= 00) for the WhiteIndianColoredexperimental group and minus 48 (p=08) for the White

IndianColored control group After the use of thecorrection formula of Tucker et al (1966) thesecorrelations became minus 39 minus 08 minus 61 and minus 35respectively Overall these correlations show that low-g persons improved their scores more strongly thanhigh-g persons

17 Discussion

Skuy et al (2002) hypothesized that the low-qualityeducation of Blacks in South Africa would lead to anunderestimate of their cognitive abilities by IQ testsGroups of Black and WhiteIndianColored studentstook the Ravens Progressive Matrices twice and inbetween received Feuersteins Mediated LearningExperience The test scores went up substantially in allgroups Evidence for an authentic change in the g factorrequires broad transfer or generalizability across a widevariety of cognitive performance However Skuy et alshow that the gains did not generalize to scores on another highly similar test and to external criteria andwere therefore hollow As the score gains were in somecases quite largendash14 IQ points for the Black experi-mental groupndashthe question becomes what is it thatimproved

The findings show that the correlations betweenscore gains and g loadedness of the items were minus 39 forthe complete experimental group and minus 26 for thecomplete control group However because the gloadings and gain scores are measured at the itemlevel their reliabilities are not high resulting insubstantial attenuation of the correlation between gand d Moreover RSPM does not measure g perfectlyJensen (1998a p 91) estimates its g loading at 83When we estimate the reliability of the g vector at 70and the reliability of the gain score vector at 50corrections for unreliability and deviation from perfectconstruct validity of g only would result in estimatedtrue correlations of respectively minus 80 and minus 53 Thesevalues should be taken as underestimates controllingfor additional artifacts will bring them closer to the verystrong negative correlation found in the meta-analysis

The findings suggest that after training the gloadedness of the test decreased substantially Wefound negative substantial correlations between gainscores and RSPM total scores Table 4 shows that thetotal score variance decreased after training which is inline with low-g subjects increasing more than high-gsubjects Since as a rule high-g individuals profit themost from trainingndashas is reflected in the ubiquitouspositive correlation between IQ scores and trainingperformance (Jensen 1980 Schmidt amp Hunter 1998)ndash

295J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

these findings could be interpreted as an indication thatFeuersteins Mediated Learning Experience is not g-loaded in contrast with regular trainings that are clearlyg-loaded Substantial negative correlations betweengain scores and RSPM total scores are no definite proofof this hypothesis but are in line with it Additionalsubstantiation of our hypothesis that the Feuersteintraining has no or little g loadedness is that Coyle (2006)showed that gain scores loaded virtually zero on the gfactor Moreover Skuy et al reported that the predictivevalidity of their measure did not increase when thesecond Raven score was used The fact that individualswith low-g gained more than those with high-g could beinterpreted as an indication that the Mediated LearningExperience was not g-loaded It should be notedhowever that Feuerstein most likely did not intend hisintervention to be g-loaded He was interested inincreasing the performance of low scorers on bothtests and external criteria

18 General discussion

IQ scores are by far the best general predictor ofsuccess in education job training and work Howeverthere are many ways in which these IQ scores can beincreased for instance by means of retesting orparticipating in a learning potential training programWhat conclusions can be drawn from such score gainsJensens (1998a) hypothesis that the effects of trainingon abilities can be summarized in terms of Carrollsthree-stratum hierarchical factor model was tested in ameta-analysis on testndashretest data using Dutch Britishand American test batteries and with learning potentialdata from South Africa using Ravens ProgressiveMatrices The meta-analysis convincingly shows thattestndashretest score gains are not g-loaded The findingsfrom the learning potential study are clearly in line withthis when the attenuation caused by unreliability andother artifacts is taken into account the correlationbetween g loadings of items and gains on items has avalue that is somewhat comparable to the one found inthe meta-analysis for test batteries The data suggest thatthe g loadedness of item scores decreases after theintervention training Te Nijenhuis et als (2001)finding that practice and coaching reduced the g-loadedness of their test scores strengthens the presentfindings using item scores The findings show that notthe high-g participants increase their scores the mostndashasis common in training situationsndashbut it is the low-gpersons showing the largest increases of their scoresThis suggests that the intervention training is not g-loaded

Our findings fit quite well with the hierarchical modelof intelligence The generalizability of test scores residespredominantly in the g component whereas the test-specific ability component and the narrow abilitycomponent are virtually non-generalizable This is forinstance evidenced by the earlier finding that addingverbal tests to a g score or numerical tests to a g scoreresulted in only a very small incremental validity (Ree ampEarles 1991 Ree et al 1994) Additionally Ericssonand Lehmann (1996) reported immense gains for amemory task focusing on one narrow ability but did notfind any improvement for comparable memory tasksfocusing on another narrow ability As the score gains arenot related to g the generalizable g componentdecreases and since it is not unlikely that the Feuersteintraining itself is not g-loaded it is easy to understand whythe score gains did not generalize to scores on thecognitively loaded Representational Stencil Design TestFor a similar reason the score gains did not generalize tog-loaded external criteria as the correlation of the RSPMscores with performance in the end-of-year psychologyexamination did not significantly improve after media-tion Reeve and Lam (2005) claimed that retesting doesnot change the nature of what is being tested but ourfindings suggest the opposite

19 Limitations of the studies

Our meta-analysis and our analysis of the SouthAfrican study are strongly based on the method ofcorrelated vectors (MCV) and recently it has been shownto have limitations Dolan and Lubke (2001) have shownthat when comparing groups substantial positive vectorcorrelations can still be obtained even when groups differnot only on g but also on factors uncorrelated with gAshton and Lee (2005) show that associations of avariable with non-g sources of variance can produce avector correlation of zero even when the variable isstrongly associated with g They suggest that the gloadings of a subtest are sensitive to the nature of the othersubtest in a battery so that a specific sample of subtestsmay cause a spurious correlation between the vectorsNotwithstanding these limitations studies using MCVcontinue to appear (see for instance Colom Haier ampJung in pressHartmannKruuseampNyborg in press Leeet al 2006) The outcomes of our meta-analysis of a largenumber of studies using the method of correlated vectorsmay make an interesting contribution to the discussion onthe limitations of the method of correlated vectors

A principle of meta-analysis is that the amount ofinformation contained in one individual study is quitemodest Therefore one should carry out an analysis of

296 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

all studies on one topic and correct for artifacts leadingto a strong increase of the amount of information Thefact that our meta-analytical value of r=minus106 isvirtually identical to the theoretically expected correla-tion between g and d of minus100 holds some promise thata psychometric meta-analysis of studies using MCV is apowerful way of reducing some of the limitations ofMCV An alternative methodological approach is tolimit oneself to the rare datasets enabling the use ofstructural equations modeling However from a meta-analytical point of view these studies yield only a quitemodest amount of information

Additional meta-analyses of studies employing MCVare necessary to establish the validity of the combinationof MCV and psychometric meta-analysis Most likelymany would agree that a high positive meta-analyticalcorrelation between measures of g and measures ofanother construct implies that g plays a major role andthat a meta-analytical correlation of minus100 implies that gplays no role However it is not clear what value of themeta-analytical correlation to expect from MCV when gplays only a modest role After the present meta-analysison a construct that clearly has an inverse relationshipwith g it would be informative to carry out meta-analyses of studies on variables that are strongly linkedto g and variables that are modestly linked to g Anexample of the latter would be secular score gainswhich according to Lynns (1990) nutrition theoryshould be modestly g-loaded

The sample sizes in the South African study are notlarge but still larger than those in many other studies oflearning potential where an Nasymp10 is not unusual Theresults of a reanalysis of the many existing studies ondynamic testing could lead to a meta-analysis with alarge combined N The mean posttest score was quitehigh so a ceiling effect may have taken place for theWhiteIndianColored group leading to an underestima-tion of the experimental score gain for this group

Instead of testing the hypothesis with a stronglyunidimensional test such as the RSPM it would be betterto use a multidimensional test Moreover a large samplesize would allow the use of more rigorous data-analyticaltechniques leading to more definitive results Howeverto the best of our knowledge datasets meeting theserequirements do not exist and the Skuy et al study isarguably the best South African learning potential study

20 Score gains as low-quality measuresof motivation

As criterion-related validity is strongly dependent ong te Nijenhuis et als finding of lowered g loadings

after training should result in lowered criterion-relatedvalidity However the empirical findings show theopposite virtually all testndashretest and test preparationstudies on cognitive tests and scholastic aptitude tests thatreported both criterion-related validities demonstratesmall to modest increases in criterion-related validity forthe second or third test score (see Allalouf amp Ben-Shakhar 1998 Bashi 1976 Coyle 2006 HausknechtTrevor amp Farr 2002 Jones 1986 Linn 1977 Olsen ampSchrader 1959 Ortar 1960 Powers 1985 Reeve ampLam 2005) In the carefully designed study by Allaloufand Ben-Shakhar (1998) of a university entrance test theexperimental group received an intensive 40-h testcoaching program while the control group did not Thecriterion-related validity for the retest increased for bothgroups Most importantly the increase was the samemdashitwas not larger for the experimental group

In a little-known but carefully designed large-scalelearning potential study by Resing (1990 see Table423) she compared an experimental group thatreceived a pretest a learning potential training and aposttest against a control group that received only thepretest and the posttest The mean criterion-relatedvalidity of the various second scores was 62 for both theexperimental and the control group Learning potentialtraining did not result in incremental criterion-relatedvalidity over and above the validity resulting fromsimply retesting The findings from both Resing andAllalouf and Ben-Shakhar suggest that cognitiveinterventions do not increase criterion-related validitymore than simple retesting

g and the personality measure conscientiousness havebeen shown to make an excellent combination ofpredictors (Schmidt amp Hunter 1998) Conscientiousnessrepresents among other characteristics persistence a willto achieve and the ability to focus effort on the goal Afield study on test preparation using actual job applicants(Clause Delbridge Schmitt Chan amp Jennings 2001)showed that motivation to perform well on the testcorrelated 25 with test performance One could speculatethat score increases do not reflect a true cognitivecomponent but rather become low-quality measures ofmotivation Further since the increase in validity due toretesting and learning potential training is modest incomparison to the large increase obtainable from the useof personality questionnaires personality testing mightprovide a less expensive and more accurate alternative

21 Effectiveness of various training formats

Components of the mediation training used by Skuyet al (2002) are similar to the test training used in te

297J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Nijenhuis et al (2001) Both the Dutch training and theSouth African training took 3 h but whereas in theDutch training the focus was on two different testformats the South African training dealt only with onetest format The test training by Lloyd and Pidgeon(1961) took even less time namely two half-hoursegments each focusing on one test format The effectsizes in all studies were roughly comparable Thissuggests that the methodologies employed by teNijenhuis et al and Lloyd and Pidgeon were moreefficient than those used by Skuy et al It is possible thatthe components of the mediation training that are notpresent in the other two training formats are not effectivein raising test scores and could therefore be left out Iftrue it might be possible to increase the scores on theRSPM by one SD with a relatively simple 1-h training

22 Generalizability of findings

Can these findings of hollow score gains after testndashretest test practice and Mediated Learning ExperienceTraining be generalized to other studies where training-induced score gains were found Ericsson and Lehmann(1996) reported tremendous score increases afterintensive training on numeric memory tests but thesegains did not generalize in the least to verbal memorytests Such gains on one narrow ability do not generalizeto another narrow ability clustering under the samebroad ability and are therefore hollow Similarly Jensen(1998b) showed that score gains due to adoption werenot on the g factor and were therefore most likelyhollow

Rushton (1999) argued that intergenerational scoregains are not linked to g suggesting the Flynn effectsmay be empty but he was strongly criticized by Flynn(1999 2000) In studies on the Flynn effect score gainsfound in cross-sectional studies are largest on the RSPM(Flynn 1987) It has been suggested by Lynn (1998) thata substantial part of these intergenerational score gainson the RSPM are generalizablendashthey do reflect highergndashbut the remaining part is hollow and should beinterpreted as schooling effects The RSPM does requirethe application of the mathematical principles ofaddition subtraction progression and the distributionof values In the three decades (1950sndash1980s) overwhich these increases in RSPM scores have occurredincreasing proportions of 15- to 18-year-olds haveremained in schools where they have learned mathskills that they have applied to the solution of matricesproblems Our findings could be interpreted as supportfor Lynns hypothesis of the partial hollowness of scoregains on the RSPM Notwithstanding the high g loading

of the sum score of the RSPM it is quite sensitive totestndashretest effects and training effects Some studies onthe Flynn effect (Lynn amp Hampson 1986 Teasdale ampOwen 1989) show that the increase in scores is largelyconcentrated in the lower segments of the IQ distribu-tion Our finding that low scorers show the largest gainsafter training may additionally support the notion that apart of the Flynn effect on the RSPM is hollow FinallyWicherts et als (2004) findings show that in some oftheir datasets the secular score gains are most stronglylinked to broad- narrow- and test-specific abilitiesshowing that an important part of the gains are non-generalizable

Ceci (1991) showed that increased schooling leads tohigher IQ scores but are these gains highly specific orpredominantly generalizable It would be interesting toapply the techniques we used in this study to thefindings from previous intervention studies It may bethat biological interventions (such as diet vitaminsupplements vaccination against infectious disease)rather than psychological or educational interventionsare the most cost-effective method of producing truechanges in g and broad abilities It may be that there is abiological barrier between the first stratum and thesecond stratum that restricts the effects of behavioralinterventions to narrow abilities and test specificities

Acknowledgement

We like to thank Mervyn Skuy for his permission touse his dataset

Thanks to Marieacute de Beer Raegan Murphy WelkoTomic Art Jensen and Frank Schmidt for feedback onprevious versions of this paper

Thanks to Arne Evers Wilma Resing (Dutch TestCommittee) and Andress Kooij (Harcourt) for alsohelping in locating testndashretest studies

References

Ackerman P L (1986) Individual differences in informationprocessing An investigation of intellectual abilities Intelligence10 101minus139

Ackerman P L (1987) Individual differences in skill learning Anintegration of psychometric and information processing skillsPsychological Bulletin 102 3minus27

Allalouf A amp Ben-Shakhar G (1998) The effect of coaching on thepredictive validity of scholastic aptitude tests Journal ofEducational Measurement 35(1) 31minus47

Ashton M C amp Lee K (2005) Problems with the method ofcorrelated vectors Intelligence 33 431minus444

Bashi Y (1976) Verbal and non-verbal abilities of 4th 6th and 8thgrade students in the Arab educational system in Israel JerusalemHebrew University School of Education

298 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Bleichrodt N Resing W C M Drenth P J D amp Zaal J N (1987)Intelligentie-meting bij kinderen Empirische en methodologischeverantwoording van de geReviseerde Amsterdamse Kinder Intelli-gentie Test [Measuring the intelligence of children Empirical andmethodological justification of the Revised Amsterdam ChildrenIntelligence Test] Lisse the Netherlands Swets

Bennett G K Seashore H G ampWesman A G (1974)DifferentialAptitude Tests (5th ed) Manual New York The PsychologicalCorporation

Boeyens J C A (1989) Learning potential An empiricalinvestigation Pretoria South Africa Human Science ResearchCouncil

Bosch F (1973) Inventarisatie beschrijving en onderzoek mbt dewijzigingen van de GATB incl test-hertest onderzoek (NoPz3bRp0120) [Stock-taking description and research concern-ing the modifications of the GATB includes testndashretest study]Utrecht the Netherlands Nederlandse Spoorwegen

Carroll J B (1993) Human cognitive abilities A survey of factoranalysis studies Cambridge University Press

Ceci S J (1991) How much does schooling influence generalintelligence and its cognitive components A reassessment of theevidence Developmental Psychology 27 703minus722

Christian K Bachnan H J amp Morrison F J (2001) Schooling andcognitive development In R J Sternberg amp E L Grigorenko(Eds) Environmental effects on cognitive abilities (pp 287minus335)Mahwah NJ Erlbaum

Clause C S Delbridge K Schmitt N Chan D amp Jennings D(2001) Test preparation activities and employment test perfor-mance Human Performance 14 149minus167

Cohen J (1988) Statistical power analysis for the behavioralsciences Hillsdale Lawrence Erlbaum

Colom R Jung R E amp Haier R J (in press) Finding the g-factor inbrain structure using the method of correlated vectors Intelligence

Covin T A (1977) Stability of the WISC-R for 9-year-olds withlearning difficulties Psychological Reports 40 1297minus1298

Coyle T R (2006) Testndashretest changes on scholastic aptitude tests arenot related to g Intelligence 34 15minus27

Cronbach L J (1990) Essentials of psychological testing New YorkHarperCollins

de Villiers AB (1999) Disadvantaged students academic perfor-mance Analysing the zone proximal developmentUnpublished DPhil thesis University of Cape Town South Africa

de Wolff C J amp Buiten B (1963) Een factoranalyse van viertestbatterijen [A factor analysis of four test batteries] NederlandsTijdschrift Voor Psychologie 18 220minus239

Dolan C V amp Lubke G (2001) Viewing Spearmans hypothesisfrom the perspective of multigroup PCA A comment onSchonemanns criticism Intelligence 29 231minus245

Drenth P J D Petrie J F amp Bleichrodt N (1968) Handleiding bijde Amsterdamse Kinder Intelligentie Test [Manual of theAmsterdam Children Intelligence Test] Amsterdam VrijeUniversiteit

Elliott C D (1983) British Ability Scales Manual 2 TechnicalHandbook Windsor Great-Britain NFER-Nelson

Engelbrecht M (1999) Leerpotensiaal as voorspeller van akademi-ese sukses van universiteitsstudente [Learning potential aspredictor of the academic success of university students]Unpublished D Phil thesis Potchefstroom University forChristian Higher Education South Africa

Ericsson K A amp Lehmann A C (1996) Expert and exceptionalperformance Evidence of maximal adaptation to task constraintsAnnual Review of Psychology 47 273minus305

Evers A amp Lucassen W (1991) Handleiding DAT 83 DifferentieumlleAanleg Testserie [Manual DAT83 Differential Aptitude Testseries] Amsterdam Swets

Fleishman E A amp Hempel W E (1955) The relation betweenabilities and improvement with practice in a visual discriminationreaction task Journal of Experimental Psychology 49 301minus312

Flynn J R (1987) Massive IQ gains in 14 nations What IQ testsreally measure Psychological Bulletin 101 171minus191

Flynn J R (1999) Evidence against Rushton The genetic loading ofWISC-R subtests and the causes of between-group IQ differencesPersonality and Individual Differences 26 373minus379

Flynn J R (2000) IQ gains WISC subtests and fluid g g theory andthe relevance of Spearmans hypothesis to race In G R B JGoode (Ed) The nature of intelligence (pp 202minus227) New YorkWiley

Gaydon VP (1988) Predictors of performance of disadvantagedadolescents on the SowetoAlexandra gifted child programmeUnpublished M Ed dissertation University of the WitwatersrandSouth Africa

Gottfredson L S (1997) Why g matters The complexity of everydaylife Intelligence 24(1) 79minus132

Gottfredson L S (2002) g Highly general and highly practical In RJ Sternberg amp E L Grigorenko (Eds) The general intelligencefactor How general is it (pp 331minus380) Mahwah NJ Erlbaum

Grigorenko E L amp Sternberg R J (1998) Dynamic testing Psy-chological Bulletin 124 75minus111

HaeckW Yeld N Conradie J Robertson N amp Shall A (1997) Adevelopmental approach to mathematics testing for universityadmissions and course placement Educational Studies in Mathe-matics 33 71minus91

Hartmann P Kruuse NHS amp Nyborg H (in press) Testing thecross-racial generality of Spearmans hypothesis in two samplesIntelligence

Hausknecht J P Trevor C O amp Farr J L (2002) Retaking abilitytests in a selection setting Implications for practice effects trainingperformance and turnover Journal of Applied Psychology 87(2)243minus254

Hunter J E amp Schmidt F L (1990) Methods of meta-analysisLondon Sage

Hunter J E amp Schmidt F L (2004) Methods of meta-analysis (2nded) London Sage

Jensen A R (1980) Bias in mental testing London MethuenJensen A R (1985) The nature of the blackndashwhite difference on

various psychometric tests Spearmans hypothesis Behavioraland Brain Sciences 8 193minus263

Jensen A R (1998a) The g factor The science of mental abilityLondon Praeger

Jensen A R (1998b) Adoption data and two g-related hypothesesIntelligence 25 1minus6

Johnson W Bouchard T J Krueger R F Jr McGue M ampGottesman I I (2004) Just one g Consistent results from threetest batteries Intelligence 32 95minus107

Johnson W te Nijenhuis J amp Bouchard TJ Jr (in press)Replication of the hierarchical visual-perceptual-image rotationmodel in de Wolff and Buitens (1963) battery of 46 tests of mentalability Intelligence

Jones R J (1986) A comparison of the predictive validity of theMCAT for coached and uncoached students Journal of MedicalEducation 61 335minus338

Kaufman A S amp Kaufman N L (1983) K-ABC KaufmanAssessment Battery for Children Interpretive manual CirclePines MN AGS

299J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Kooij A P Rolfhus E Wilkins C Yang Z amp Zhu J (2005)WAIS-III Nederlandstalige bewerking Technisch rapport hernor-mering [WAIS-III adoptation in Dutch Technical report renorm-ing] Amsterdam Harcourt

Kort W Schittekatte M Dekker P H Verhaeghe P Compaan EL Bosmans M amp Vermeir G (2005) WISC-IIINL WechslerIntelligence Scale for Children Derde Editie NL Handleiding enverantwoording [The Dutch WISC-III Wechsler Intelligence Scalefor Children Third Edition for the Netherlands Manual andjustification] Amsterdam NIP

Kulik J A Bangert-Drowns R L amp Kulik C C (1984)Effectiveness of coaching for aptitude tests PsychologicalBulletin 95 179minus188

Kulik J A Kulik C C amp Bangert R L (1984) Effects of practiceon aptitude and achievement test scores American EducationalResearch Journal 21 435minus447

Lee K H Choi Y Y Gray J R Cho S H Chae J -H Lee S etal (2006) Neural correlates of superior intelligence Strongerrecruitment of posterior parietal cortex NeuroImage 29(2)578minus586

Linn R L (1977) On the treatment of multiple scores for LawSchool Admission Test repeaters (Report LSAC-77-4) In LawSchool Admission Council Reports of LSAC Sponsored ResearchVolume III 1975-1977 Princeton NJ Law School AdmissionCouncil

Lipson LE (1992) Relationship of static and dynamic measures toscholastic achievement of black pupils Unpublished MEddissertation University of Witwatersrand South Africa

Lloyd F amp Pidgeon D A (1961) An investigation into the effects ofcoaching on non-verbal test material with European Indian andAfrican children British Journal of Educational Psychology 31145minus151

Luteijn F amp Barelds D P H (2005) GIT2 Groninger IntelligentieTest 2 [GIT2 Groningen Intelligence Test 2] Amsterdam Harcourt

Lynn R (1990) The role of nutrition in secular increases inintelligence Personality and Individual Differences 11 273minus285

Lynn R (1998) In support of the nutrition theory In U Neisser(Ed) The rising curve Long-term gains in IQ and relatedmeasures (pp 207minus215) Washington DC American Psycholo-gical Association

Lynn R Allik J amp Irwing P (2004) Sex differences on three factorsidentified in Ravens Standard Progressive Matrices Intelligence32 411minus424

Lynn R amp Hampson S (1986) The rise of national intelligenceEvidence from Britain Japan and the USA Personality andIndividual Differences 7 23minus32

Matarazzo J D Carmody T P amp Jacobs L D (1980) Testndashretestreliability and stability of the WAIS A literature review withimplications for clinical practice Journal of Clinical Neuropsy-chology 2(2) 89minus105

McCormick BK Dunlap WP Kennedy RS amp Jones MB(1983) The effects of practice on the Armed Forces VocationalAptitude Test Battery US Army Research Institute for theBehavioral and Social Sciences Technical Report 602

Mulder J L Dekker R amp Dekker P H (2004) KaufmanIntelligentietest voor adolesecenten en volwassenen (KAIT)Handleiding [Kaufman Intelligence test for adolescents and adults(KAIT) Manual] Leiden the Netherlands PITS

Murphy R (2002) A review of South African research in the fieldof dynamic assessment Unpublished MA dissertation Universityof Pretoria (available online from httpupetdupaczathesisavailableetd-05042002-161239)

Nel A (1997)Die voorspelling van akademiese sukses binne kontekstvan n alternatiewe universiteitstoelatingsbeleid [The predictionof academic success within the context of an alternative policy ofuniversity admission] Unpublished MA dissertation RandAfrikaans University South Africa

Neubauer A C amp Freudenthaler H H (1994) Reaction time in asentence-picture verification test and intelligence Individualstrategies and effects of extended practice Intelligence 19193minus218

Nunnally J C amp Bernstein I H (1994) Psychometric theory(3rd ed) New York McGraw-Hill

Olsen M amp Schrader W B (1959) The use of preliminary and finalScholastic Aptitude Test scores in predicting college grades(College Entrance Examination Board Research and DevelopmentReports and Statistical Reports 59-19 Princeton NJ Educa-tional Testing Service

Ortar G R (1960) Improving test validity by coaching EducationalResearch 2 137minus142

Powers D E (1985) Effects of test preparation on the validity ofGraduate Admission Test Applied Psychological Measurement 9179minus190

Raven J Raven J C amp Court J H (2000) Standard ProgressiveMatrices Raven manual Section 3 Oxford Psychologists Press

Ree M J amp Carretta T R (1994) The correlation of generalcognitive ability and psychomotor tracking tests InternationalJournal of Selection and Assessment 2 209minus216

Ree M J amp Earles A A (1991) Predicting training success Notmuch more than g Personnel Psychology 44 321minus332

Ree M J Earles J A amp Teachout M S (1994) Predicting jobperformance Not much more than g Journal of AppliedPsychology 79 518minus524

Reeve C L amp Lam H (2005) The psychometric paradox of practiceeffects due to retesting Measurement invariance and stable abilityestimates in the face of observed score changes Intelligence 33535minus549

Resing W C M (1990) Intelligentie en leerpotentieel Eenonderzoek naar het leerpotentieel van jonge leerlingen uit hetbasis-en speciaal onderwijs [Intelligence and learning potential Astudy into the learning potential of young students in basic andspecial education] Amsterdam the Netherlands Swets

Rushton J P (1999) Secular gains in IQ are not related to the g factorand inbreeding depressionmdashunlike blackndashwhite differences A replyto Flynn Personality and Individual Differences 26 381minus389

Rushton J P Skuy M amp Bons T A (2004) Construct validity ofRavens Advanced Progressive Matrices for African and non-African engineering students in South Africa InternationalJournal of Selection and Assessment 12(3) 220minus229

Schmidt F L amp Hunter J E (1998) The validity and utility ofselection methods in personnel psychology Practical and theore-tical implications of 85 years of research findings PsychologicalBulletin 124(2) 262minus274

Schmidt F L amp Hunter J E (1999) Theory testing andmeasurement error Intelligence 27(3) 183minus198

Schmidt F L amp Le H (2004) Software for the Hunter-Schmidtmeta-analysis methods University of Iowa Department ofManagement and Organization IOWA City IQ 42242

Schroots J J F amp van Alphen de Veer R J (1979) LDT LeidseDiagnostische Test Deel 1 Handleiding [LDT Leiden DiagnosticTest Part 1 Manual] Lisse the Netherlands Swets

Shochet I M (1986) Manifest and potential performance inadvantaged and disadvantaged students Unpublished DPhildissertation University of the Witwatersrand South Africa

300 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Skuy M Gewer A Osrin Y Khunou D Fridjon P amp RushtonJ P (2002) Effects of mediated learning experience on RavensMatrices scores of African and non-African university studentsin South Africa Intelligence 30 221minus232

Swanson H E amp Lussier C M (2001) A selective synthesis of theexperimental literature on dynamic assessment Review of Educa-tional Research 71 321minus363

Teasdale T W amp Owen D R (1989) Continuing secular increase inintelligence and a stable prevalence of high intelligence levelsIntelligence 13 255minus262

Tuma J M amp Appelbaum A S (1980) Reliability and practiceeffects of WISC-R IQ estimates in a normal population Educa-tional and Psychological Measurement 40 671minus678

te Nijenhuis J Tolboom E Resing W amp Bleichrodt N (2004)Does cultural background influence the intellectual performance ofchildren from immigrant groups Validity of the RAKITintelligence test for immigrant children European Journal ofPsychological Assessment 20 10minus26

te Nijenhuis J amp van der Flier H (1997) Comparability of GATBscores for immigrants and majority group members Some Dutchfindings Journal of Applied Psychology 82 675minus687

te Nijenhuis J Voskuijl O F amp Schijve N B (2001) Practice andcoaching on IQ tests Quite a lot of g International Journal ofSelection and Assessment 9 302minus308

Thorndike R L (1985) The central role of general ability inprediction Multivariate Behavioral Research 20 241minus254

Tucker L R Damarin F amp Messick S (1966) A base-free measureof change Psychometrika 31(4) 457minus473

van der Doef M P Kwint J M amp van der Koppel (1989) Wat lerenmoeilijk lerende kinderen van de WISC-R [What do children whohave difficulties in learning learn from the WISC-R] Kind enAdolescent 10 136minus141

United States Department of Labor (1970) Manual for the USTESGeneral Aptitude Test Battery Section III DevelopmentWashing-ton DC United States Department of Labor

van Geffen (1972) De betrouwbaarheid van de GATB 1002-B opbrugklasniveau [The reliability of the GATB 1002 B for the firstclass at secondary school] Catholic University Nijmegen theNetherlands Psychology of Work and Organisation

van Haasen P P de Bruyn E E J Pijl Y J Poortinga Y H LutjeSpelberg H C Vander Steene G et al (1986) WISC-RWechsler Intelligence Scale for Children-Revised Nederlandsta-lige uitgave [WISC-R Wechsler Intelligence Scale for Children-Revised Dutch edition] Lisse the Netherlands Swets

Wechsler D (1955) Manual for the Wechsler Adult IntelligenceScale New York The Psychological Corporation

Wechsler D (1967)Manual for the Wechsler Preschool and PrimaryScale of Intelligence New York The Psychological Corporation

Wechsler D (1974) Manual for the Wechsler Intelligence Scale forChildren-Revised New York The Psychological Corporation

Wechsler D (1981) WAIS-R manual Wechsler Adult IntelligenceScale-Revised New York The Psychological Corporation

Wechsler D (1997)WAIS-III Wechsler Adult Intelligence Scale-thirdedition and WMS-III Wechsler Memory Scale-third editionTechnical manual New York The Psychological Corporation

Wicherts J W Dolan C V Oosterveld P van Baal G C VBoomsma D I amp Span M M (2004) Are intelligence testsmeasurement invariant over time Investigating the nature of theFlynn effect Intelligence 32(5) 509minus537

Yeld N amp Haeck W (1997) Educational histories and academicpotential Can tests deliver Assessment and Evaluation in HigherEducation 22 5minus16

Zaaiman H (1998) Selecting students for Mathematics and ScienceThe challenge facing higher education in South Africa SouthAfrica Pretoria HSRC Publishers

Zaaiman H van der Flier H amp Thijs G D (2001) Dynamic testingin selection for an educational programme Assessing SouthAfrican performance on the Raven Progressive Matrices Inter-national Journal of Selection and Assessment 9 258minus269

Zolezzi S A (1992) Alternative selection measures for universityundergraduate admissions Unpublished MEd dissertation Uni-versity of the Witwatersrand South Africa

Zolezzi S A (1995) The effectiveness of dynamic assessment as analternative aptitude testing strategy Unpublished DPhil disserta-tion University of South Africa South Africa

292 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

13 Procedure

The students participated in pre- and posttest phaseswith a group intervention in between The study focusedon improvement in scores on the RSPM using the SetVariations II of the Learning Propensity AssessmentDevice as the mediation task Mediation training took3 h and was conducted by three experienced psychol-ogists with the assistance of six postgraduate psychol-ogy students A detailed description is given in Skuy etal (2002)

14 Measures and cognitive intervention

The Ravens Standard Progressive Matrices consistsof 60 items (divided into 5 sets of 12 items) designedto measure the ability to form comparisons to reasonby analogy and to organize spatial information intorelated wholes It has been established as one of thepurest measures of g (Jensen 1998a) Skuy et al(2002) found no evidence for test bias against Blacksin South African education Rushton Skuy and Bons(2004) showed that the Ravens gave comparablepredictive validities for students from various groupsCross-cultural testing research has clearly shown thatunsufficient proficiency in the language of the testcan lead to biased assessments in tests with a strongverbal component However the Ravens is a non-verbal test

The Learning Propensity Assessment Device con-sists of 14 exercises Each exercise contains an initialmediation task Subsequent tasks increase in complex-ity and novelty and aim to assist the learner toachieve mastery over the task The purpose ofmediation is to assist the learner to develop theappropriate cognitive strategies and functions neededfor the successful completion of the task The SetVariations II of the Learning Propensity AssessmentDevice consists of five sets of items which comprisevariations of Sets C D and E of the RSPM test Eachset of variations contains a learning task for thepurpose of initial mediation followed by a series ofprogressively more difficult variations to which theskills learned must be applied Mediation involvesdiscussing with groups how to define the problem tobe solved focus on the task set rules regulateproblem solving behavior and identify the correctsequence of logical steps needed to solve the taskMediation also involves helping the subject todevelop appropriate concepts verbal tools andinsights in relation to the task A detailed descriptionis given in Skuy et al (2002)

15 Statistical analyses

Although the Skuy et al study is among the SouthAfrican learning potential studies with the largestsample size the N is not large We therefore chosebasic statistical analyses

151 Descriptive statistics

Means standard deviations and reliabilities werecomputed for the various groups With regard tomeasures of effect size Hunter and Schmidt (1990 p271) advise choosing estimates of variance with the leasterror Because repeated test takings tend to change thesize of the SD (Ackerman 1987) we chose the SD ofthe pretests for the denominator The correlationbetween scores before and after the training wascomputed to see whether the training had an effect onthe rank order of individuals scores

152 Correlation between score gains and g loadedness

Because our sample was not large and quite specificestimates of g loadedness were taken from Lynn Allikand Irwings (2004) item analysis of RSPM in Estoniausing a large (N=2735) nationally representativesample The same reasoning as in psychometric meta-analysis applies namely that larger samples give betterestimates of g loadings than smaller samples In ahierarchical factor analysis of the items using structuralequations modeling Lynn et al computed g loadings of52 of the 60 items In the present study Pearsoncorrelations were calculated between the g loadings ofthese 52 items and the effect sizes on these items

153 g loadings

The RSPM consists of dichotomous items so wecomputed a correlation matrix of polychoric correlations(Nunnally amp Bernstein 1994) A principal axis factoranalysis was carried out The percentage varianceexplained by the first unrotated factor was taken as anestimate of g loadedness Because sample size waslimited we collapsed the experimental and the controlgroup

154 Correlation between sum scores and score gains

We tested whether individuals with low-g improvedtheir scores more than those with high-g by correlatinggain scores with pretest RSPM scores for each of thefour research groups As gain scores tend to be

Table 3Proportion of sample selecting the correct answer on items of Ravens Standard Progressive Matrices by group

Set A Set B Set C Set D Set E

Item Black Other a Item Black Other Item Black Other Item Black Other Item Black Other

1 100 100 13 100 100 25 100 97 37 100 100 49 74 902 97 100 14 100 100 26 96 100 38 99 100 50 64 903 97 100 15 100 100 27 96 100 39 89 100 51 79 974 100 97 16 91 97 28 86 93 40 92 100 52 56 835 100 100 17 96 97 29 94 97 41 96 100 53 52 836 99 100 18 85 100 30 76 83 42 92 100 54 35 767 94 97 19 77 66 31 88 97 43 77 100 55 42 798 91 93 20 79 97 32 50 79 44 76 93 56 21 699 100 97 21 83 97 33 74 90 45 71 97 57 30 4110 91 97 22 92 100 34 61 79 46 79 93 58 12 4111 83 90 23 80 90 35 53 69 47 29 41 59 02 1712 68 83 24 59 83 36 06 35 48 26 38 60 11 21a Other=White Indian and Colored

293J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

negatively correlated with pretest scores as a function ofunreliability (see Cronbach 1990 Nunnally amp Bern-stein 1994) we corrected the correlations using TuckerDamarin and Messicks (1966) formula 63 Using theformula one adds to each correlation the term (SDpretestSD gain score) (1minus reliability pretest)

16 Results

161 Descriptive statistics

Internal consistencies (Cronbach αs) on the RSPMranged from 76 to 86 for the pre- and posttestsrespectively Table 3 shows the proportion of each of thegroups which selected the correct answer on each of the60 items of the pretest Across the 60 items the order ofthe p values was almost identical for Blacks and WhiteIndianColoreds (r=92 p=00)

Table 4 shows the means and standard deviations forthe total RSPM scores for the four groups along withthe d effect sizes representing the difference betweenpre- and posttest scores (Cohen 1988) First we

Table 4Pre- and posttest mean ravens scores standard deviations and mean effect

Black experimental(n=40)

Black control (n=26)

Pretest Posttest Pretest Postte

Raw scoresM 4378 5010 4546 4835SD 664 531 669 671Percentile 14 41 16 31Effect size 095 043

Percentiles are based on US adult norms see Raven Raven and Courts (2a Other=White Indian and Colored

examined whether there was an effect of race (Blackvs WhiteIndianColored) and group (experimental vscontrol) on the pretest scores There was a significanteffect due to race (F(1 91)=2413 p=00 η2 = 21)but not group (F(1 91)=228 p= 14 η2 = 02) Thismeans that mean pretest scores of Blacks (M=4444 SD=665) were lower than those of WhiteIndianColoreds (M=5141 SD=505) and that mean pretestscores of experimental and control groups werecomparable (M=4553 SD=704 and M=48 SD=67 respectively)

Secondly we investigated the effects of training onthe posttest scores by performing a two-way ANCOVAon the total posttest scores with race and group as factorsand the total pretest scores as the covariate There was asignificant effect for group (F(1 95)=1381 p=00η2 = 13) and for race (F(1 90)=399 p=05 η2 = 04)but not for the two-way interaction of group and race (F(1 90)=028 p= 60 η2 = 00) These results indicatethat the training was equally effective for both the Blackand WhiteIndianColored students Posttest scores ofBlacks (M=4941 SD=591) however remained

sizes for Black and WhiteIndianColored students

Other a experimental(n=15)

Other control (n=14)

st Pretest Posttest Pretest Posttest

5020 5580 5271 5536605 376 345 34341 75 55 68093 077

000) Table SPM13

294 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

significantly lower (F(1 91)=2833 p=00) than thoseof WhitesIndiansColoreds (M=5559 SD=355)Although posttest scores of the experimental group(M=5165 SD=553) were higher than those of thecontrol group (M=508 SD = 665) differencesbetween both groups were nonsignificant (F(1 91)=085 p=36)

The correlation between scores before and after thetraining was 84 (p=00) for the experimental group and90 (p=00) for the control group showing that thetraining had only limited effect on the rank order ofindividuals scores This means that the test strongly butnot perfectly measures the same constructs on bothoccasions

162 Correlation between score gains and g loadedness

We estimated effect sizes for each of the four groups(race by condition) by computing the differencebetween mean pretest scores and posttest scores dividedby the standard deviation of the pretest scores of Blackand WhiteIndianColored students respectivelyFinally we calculated the correlations between effectsizes and the g loadings taken from Lynn et alCorrelations were minus 24 (p=10) for the Black experi-mental group minus 21 (p=20) for the WhiteIndianColored experimental group minus 08 (p=59) for theBlack control group and minus 41 (p=01) for the WhiteIndianColored control group Small sample sizesusually attenuate correlations (Hunter amp Schmidt1990) Collapsing the groups indeed resulted in higheraverage correlations minus 39 for the complete experimen-tal group and minus 26 for the complete control group

163 g loadings

Using the combined experimental and controlgroup a principle axis factor analysis on the pretestand posttest scores respectively resulted in a firstunrotated factor explaining 22 of the variance in thepretest scores and 18 of the variance in the posttestscores These findings suggest that the g loadedness ofthe RSPM decreased substantially after MediatedLearning Experience

164 Correlation between score gains and sum score

Correlating score gains with RSPM total scoresresulted in values of minus 60 (p=00) for the Blackexperimental group minus 18 (p=38) for the Black controlgroup minus 82 (p= 00) for the WhiteIndianColoredexperimental group and minus 48 (p=08) for the White

IndianColored control group After the use of thecorrection formula of Tucker et al (1966) thesecorrelations became minus 39 minus 08 minus 61 and minus 35respectively Overall these correlations show that low-g persons improved their scores more strongly thanhigh-g persons

17 Discussion

Skuy et al (2002) hypothesized that the low-qualityeducation of Blacks in South Africa would lead to anunderestimate of their cognitive abilities by IQ testsGroups of Black and WhiteIndianColored studentstook the Ravens Progressive Matrices twice and inbetween received Feuersteins Mediated LearningExperience The test scores went up substantially in allgroups Evidence for an authentic change in the g factorrequires broad transfer or generalizability across a widevariety of cognitive performance However Skuy et alshow that the gains did not generalize to scores on another highly similar test and to external criteria andwere therefore hollow As the score gains were in somecases quite largendash14 IQ points for the Black experi-mental groupndashthe question becomes what is it thatimproved

The findings show that the correlations betweenscore gains and g loadedness of the items were minus 39 forthe complete experimental group and minus 26 for thecomplete control group However because the gloadings and gain scores are measured at the itemlevel their reliabilities are not high resulting insubstantial attenuation of the correlation between gand d Moreover RSPM does not measure g perfectlyJensen (1998a p 91) estimates its g loading at 83When we estimate the reliability of the g vector at 70and the reliability of the gain score vector at 50corrections for unreliability and deviation from perfectconstruct validity of g only would result in estimatedtrue correlations of respectively minus 80 and minus 53 Thesevalues should be taken as underestimates controllingfor additional artifacts will bring them closer to the verystrong negative correlation found in the meta-analysis

The findings suggest that after training the gloadedness of the test decreased substantially Wefound negative substantial correlations between gainscores and RSPM total scores Table 4 shows that thetotal score variance decreased after training which is inline with low-g subjects increasing more than high-gsubjects Since as a rule high-g individuals profit themost from trainingndashas is reflected in the ubiquitouspositive correlation between IQ scores and trainingperformance (Jensen 1980 Schmidt amp Hunter 1998)ndash

295J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

these findings could be interpreted as an indication thatFeuersteins Mediated Learning Experience is not g-loaded in contrast with regular trainings that are clearlyg-loaded Substantial negative correlations betweengain scores and RSPM total scores are no definite proofof this hypothesis but are in line with it Additionalsubstantiation of our hypothesis that the Feuersteintraining has no or little g loadedness is that Coyle (2006)showed that gain scores loaded virtually zero on the gfactor Moreover Skuy et al reported that the predictivevalidity of their measure did not increase when thesecond Raven score was used The fact that individualswith low-g gained more than those with high-g could beinterpreted as an indication that the Mediated LearningExperience was not g-loaded It should be notedhowever that Feuerstein most likely did not intend hisintervention to be g-loaded He was interested inincreasing the performance of low scorers on bothtests and external criteria

18 General discussion

IQ scores are by far the best general predictor ofsuccess in education job training and work Howeverthere are many ways in which these IQ scores can beincreased for instance by means of retesting orparticipating in a learning potential training programWhat conclusions can be drawn from such score gainsJensens (1998a) hypothesis that the effects of trainingon abilities can be summarized in terms of Carrollsthree-stratum hierarchical factor model was tested in ameta-analysis on testndashretest data using Dutch Britishand American test batteries and with learning potentialdata from South Africa using Ravens ProgressiveMatrices The meta-analysis convincingly shows thattestndashretest score gains are not g-loaded The findingsfrom the learning potential study are clearly in line withthis when the attenuation caused by unreliability andother artifacts is taken into account the correlationbetween g loadings of items and gains on items has avalue that is somewhat comparable to the one found inthe meta-analysis for test batteries The data suggest thatthe g loadedness of item scores decreases after theintervention training Te Nijenhuis et als (2001)finding that practice and coaching reduced the g-loadedness of their test scores strengthens the presentfindings using item scores The findings show that notthe high-g participants increase their scores the mostndashasis common in training situationsndashbut it is the low-gpersons showing the largest increases of their scoresThis suggests that the intervention training is not g-loaded

Our findings fit quite well with the hierarchical modelof intelligence The generalizability of test scores residespredominantly in the g component whereas the test-specific ability component and the narrow abilitycomponent are virtually non-generalizable This is forinstance evidenced by the earlier finding that addingverbal tests to a g score or numerical tests to a g scoreresulted in only a very small incremental validity (Ree ampEarles 1991 Ree et al 1994) Additionally Ericssonand Lehmann (1996) reported immense gains for amemory task focusing on one narrow ability but did notfind any improvement for comparable memory tasksfocusing on another narrow ability As the score gains arenot related to g the generalizable g componentdecreases and since it is not unlikely that the Feuersteintraining itself is not g-loaded it is easy to understand whythe score gains did not generalize to scores on thecognitively loaded Representational Stencil Design TestFor a similar reason the score gains did not generalize tog-loaded external criteria as the correlation of the RSPMscores with performance in the end-of-year psychologyexamination did not significantly improve after media-tion Reeve and Lam (2005) claimed that retesting doesnot change the nature of what is being tested but ourfindings suggest the opposite

19 Limitations of the studies

Our meta-analysis and our analysis of the SouthAfrican study are strongly based on the method ofcorrelated vectors (MCV) and recently it has been shownto have limitations Dolan and Lubke (2001) have shownthat when comparing groups substantial positive vectorcorrelations can still be obtained even when groups differnot only on g but also on factors uncorrelated with gAshton and Lee (2005) show that associations of avariable with non-g sources of variance can produce avector correlation of zero even when the variable isstrongly associated with g They suggest that the gloadings of a subtest are sensitive to the nature of the othersubtest in a battery so that a specific sample of subtestsmay cause a spurious correlation between the vectorsNotwithstanding these limitations studies using MCVcontinue to appear (see for instance Colom Haier ampJung in pressHartmannKruuseampNyborg in press Leeet al 2006) The outcomes of our meta-analysis of a largenumber of studies using the method of correlated vectorsmay make an interesting contribution to the discussion onthe limitations of the method of correlated vectors

A principle of meta-analysis is that the amount ofinformation contained in one individual study is quitemodest Therefore one should carry out an analysis of

296 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

all studies on one topic and correct for artifacts leadingto a strong increase of the amount of information Thefact that our meta-analytical value of r=minus106 isvirtually identical to the theoretically expected correla-tion between g and d of minus100 holds some promise thata psychometric meta-analysis of studies using MCV is apowerful way of reducing some of the limitations ofMCV An alternative methodological approach is tolimit oneself to the rare datasets enabling the use ofstructural equations modeling However from a meta-analytical point of view these studies yield only a quitemodest amount of information

Additional meta-analyses of studies employing MCVare necessary to establish the validity of the combinationof MCV and psychometric meta-analysis Most likelymany would agree that a high positive meta-analyticalcorrelation between measures of g and measures ofanother construct implies that g plays a major role andthat a meta-analytical correlation of minus100 implies that gplays no role However it is not clear what value of themeta-analytical correlation to expect from MCV when gplays only a modest role After the present meta-analysison a construct that clearly has an inverse relationshipwith g it would be informative to carry out meta-analyses of studies on variables that are strongly linkedto g and variables that are modestly linked to g Anexample of the latter would be secular score gainswhich according to Lynns (1990) nutrition theoryshould be modestly g-loaded

The sample sizes in the South African study are notlarge but still larger than those in many other studies oflearning potential where an Nasymp10 is not unusual Theresults of a reanalysis of the many existing studies ondynamic testing could lead to a meta-analysis with alarge combined N The mean posttest score was quitehigh so a ceiling effect may have taken place for theWhiteIndianColored group leading to an underestima-tion of the experimental score gain for this group

Instead of testing the hypothesis with a stronglyunidimensional test such as the RSPM it would be betterto use a multidimensional test Moreover a large samplesize would allow the use of more rigorous data-analyticaltechniques leading to more definitive results Howeverto the best of our knowledge datasets meeting theserequirements do not exist and the Skuy et al study isarguably the best South African learning potential study

20 Score gains as low-quality measuresof motivation

As criterion-related validity is strongly dependent ong te Nijenhuis et als finding of lowered g loadings

after training should result in lowered criterion-relatedvalidity However the empirical findings show theopposite virtually all testndashretest and test preparationstudies on cognitive tests and scholastic aptitude tests thatreported both criterion-related validities demonstratesmall to modest increases in criterion-related validity forthe second or third test score (see Allalouf amp Ben-Shakhar 1998 Bashi 1976 Coyle 2006 HausknechtTrevor amp Farr 2002 Jones 1986 Linn 1977 Olsen ampSchrader 1959 Ortar 1960 Powers 1985 Reeve ampLam 2005) In the carefully designed study by Allaloufand Ben-Shakhar (1998) of a university entrance test theexperimental group received an intensive 40-h testcoaching program while the control group did not Thecriterion-related validity for the retest increased for bothgroups Most importantly the increase was the samemdashitwas not larger for the experimental group

In a little-known but carefully designed large-scalelearning potential study by Resing (1990 see Table423) she compared an experimental group thatreceived a pretest a learning potential training and aposttest against a control group that received only thepretest and the posttest The mean criterion-relatedvalidity of the various second scores was 62 for both theexperimental and the control group Learning potentialtraining did not result in incremental criterion-relatedvalidity over and above the validity resulting fromsimply retesting The findings from both Resing andAllalouf and Ben-Shakhar suggest that cognitiveinterventions do not increase criterion-related validitymore than simple retesting

g and the personality measure conscientiousness havebeen shown to make an excellent combination ofpredictors (Schmidt amp Hunter 1998) Conscientiousnessrepresents among other characteristics persistence a willto achieve and the ability to focus effort on the goal Afield study on test preparation using actual job applicants(Clause Delbridge Schmitt Chan amp Jennings 2001)showed that motivation to perform well on the testcorrelated 25 with test performance One could speculatethat score increases do not reflect a true cognitivecomponent but rather become low-quality measures ofmotivation Further since the increase in validity due toretesting and learning potential training is modest incomparison to the large increase obtainable from the useof personality questionnaires personality testing mightprovide a less expensive and more accurate alternative

21 Effectiveness of various training formats

Components of the mediation training used by Skuyet al (2002) are similar to the test training used in te

297J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Nijenhuis et al (2001) Both the Dutch training and theSouth African training took 3 h but whereas in theDutch training the focus was on two different testformats the South African training dealt only with onetest format The test training by Lloyd and Pidgeon(1961) took even less time namely two half-hoursegments each focusing on one test format The effectsizes in all studies were roughly comparable Thissuggests that the methodologies employed by teNijenhuis et al and Lloyd and Pidgeon were moreefficient than those used by Skuy et al It is possible thatthe components of the mediation training that are notpresent in the other two training formats are not effectivein raising test scores and could therefore be left out Iftrue it might be possible to increase the scores on theRSPM by one SD with a relatively simple 1-h training

22 Generalizability of findings

Can these findings of hollow score gains after testndashretest test practice and Mediated Learning ExperienceTraining be generalized to other studies where training-induced score gains were found Ericsson and Lehmann(1996) reported tremendous score increases afterintensive training on numeric memory tests but thesegains did not generalize in the least to verbal memorytests Such gains on one narrow ability do not generalizeto another narrow ability clustering under the samebroad ability and are therefore hollow Similarly Jensen(1998b) showed that score gains due to adoption werenot on the g factor and were therefore most likelyhollow

Rushton (1999) argued that intergenerational scoregains are not linked to g suggesting the Flynn effectsmay be empty but he was strongly criticized by Flynn(1999 2000) In studies on the Flynn effect score gainsfound in cross-sectional studies are largest on the RSPM(Flynn 1987) It has been suggested by Lynn (1998) thata substantial part of these intergenerational score gainson the RSPM are generalizablendashthey do reflect highergndashbut the remaining part is hollow and should beinterpreted as schooling effects The RSPM does requirethe application of the mathematical principles ofaddition subtraction progression and the distributionof values In the three decades (1950sndash1980s) overwhich these increases in RSPM scores have occurredincreasing proportions of 15- to 18-year-olds haveremained in schools where they have learned mathskills that they have applied to the solution of matricesproblems Our findings could be interpreted as supportfor Lynns hypothesis of the partial hollowness of scoregains on the RSPM Notwithstanding the high g loading

of the sum score of the RSPM it is quite sensitive totestndashretest effects and training effects Some studies onthe Flynn effect (Lynn amp Hampson 1986 Teasdale ampOwen 1989) show that the increase in scores is largelyconcentrated in the lower segments of the IQ distribu-tion Our finding that low scorers show the largest gainsafter training may additionally support the notion that apart of the Flynn effect on the RSPM is hollow FinallyWicherts et als (2004) findings show that in some oftheir datasets the secular score gains are most stronglylinked to broad- narrow- and test-specific abilitiesshowing that an important part of the gains are non-generalizable

Ceci (1991) showed that increased schooling leads tohigher IQ scores but are these gains highly specific orpredominantly generalizable It would be interesting toapply the techniques we used in this study to thefindings from previous intervention studies It may bethat biological interventions (such as diet vitaminsupplements vaccination against infectious disease)rather than psychological or educational interventionsare the most cost-effective method of producing truechanges in g and broad abilities It may be that there is abiological barrier between the first stratum and thesecond stratum that restricts the effects of behavioralinterventions to narrow abilities and test specificities

Acknowledgement

We like to thank Mervyn Skuy for his permission touse his dataset

Thanks to Marieacute de Beer Raegan Murphy WelkoTomic Art Jensen and Frank Schmidt for feedback onprevious versions of this paper

Thanks to Arne Evers Wilma Resing (Dutch TestCommittee) and Andress Kooij (Harcourt) for alsohelping in locating testndashretest studies

References

Ackerman P L (1986) Individual differences in informationprocessing An investigation of intellectual abilities Intelligence10 101minus139

Ackerman P L (1987) Individual differences in skill learning Anintegration of psychometric and information processing skillsPsychological Bulletin 102 3minus27

Allalouf A amp Ben-Shakhar G (1998) The effect of coaching on thepredictive validity of scholastic aptitude tests Journal ofEducational Measurement 35(1) 31minus47

Ashton M C amp Lee K (2005) Problems with the method ofcorrelated vectors Intelligence 33 431minus444

Bashi Y (1976) Verbal and non-verbal abilities of 4th 6th and 8thgrade students in the Arab educational system in Israel JerusalemHebrew University School of Education

298 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Bleichrodt N Resing W C M Drenth P J D amp Zaal J N (1987)Intelligentie-meting bij kinderen Empirische en methodologischeverantwoording van de geReviseerde Amsterdamse Kinder Intelli-gentie Test [Measuring the intelligence of children Empirical andmethodological justification of the Revised Amsterdam ChildrenIntelligence Test] Lisse the Netherlands Swets

Bennett G K Seashore H G ampWesman A G (1974)DifferentialAptitude Tests (5th ed) Manual New York The PsychologicalCorporation

Boeyens J C A (1989) Learning potential An empiricalinvestigation Pretoria South Africa Human Science ResearchCouncil

Bosch F (1973) Inventarisatie beschrijving en onderzoek mbt dewijzigingen van de GATB incl test-hertest onderzoek (NoPz3bRp0120) [Stock-taking description and research concern-ing the modifications of the GATB includes testndashretest study]Utrecht the Netherlands Nederlandse Spoorwegen

Carroll J B (1993) Human cognitive abilities A survey of factoranalysis studies Cambridge University Press

Ceci S J (1991) How much does schooling influence generalintelligence and its cognitive components A reassessment of theevidence Developmental Psychology 27 703minus722

Christian K Bachnan H J amp Morrison F J (2001) Schooling andcognitive development In R J Sternberg amp E L Grigorenko(Eds) Environmental effects on cognitive abilities (pp 287minus335)Mahwah NJ Erlbaum

Clause C S Delbridge K Schmitt N Chan D amp Jennings D(2001) Test preparation activities and employment test perfor-mance Human Performance 14 149minus167

Cohen J (1988) Statistical power analysis for the behavioralsciences Hillsdale Lawrence Erlbaum

Colom R Jung R E amp Haier R J (in press) Finding the g-factor inbrain structure using the method of correlated vectors Intelligence

Covin T A (1977) Stability of the WISC-R for 9-year-olds withlearning difficulties Psychological Reports 40 1297minus1298

Coyle T R (2006) Testndashretest changes on scholastic aptitude tests arenot related to g Intelligence 34 15minus27

Cronbach L J (1990) Essentials of psychological testing New YorkHarperCollins

de Villiers AB (1999) Disadvantaged students academic perfor-mance Analysing the zone proximal developmentUnpublished DPhil thesis University of Cape Town South Africa

de Wolff C J amp Buiten B (1963) Een factoranalyse van viertestbatterijen [A factor analysis of four test batteries] NederlandsTijdschrift Voor Psychologie 18 220minus239

Dolan C V amp Lubke G (2001) Viewing Spearmans hypothesisfrom the perspective of multigroup PCA A comment onSchonemanns criticism Intelligence 29 231minus245

Drenth P J D Petrie J F amp Bleichrodt N (1968) Handleiding bijde Amsterdamse Kinder Intelligentie Test [Manual of theAmsterdam Children Intelligence Test] Amsterdam VrijeUniversiteit

Elliott C D (1983) British Ability Scales Manual 2 TechnicalHandbook Windsor Great-Britain NFER-Nelson

Engelbrecht M (1999) Leerpotensiaal as voorspeller van akademi-ese sukses van universiteitsstudente [Learning potential aspredictor of the academic success of university students]Unpublished D Phil thesis Potchefstroom University forChristian Higher Education South Africa

Ericsson K A amp Lehmann A C (1996) Expert and exceptionalperformance Evidence of maximal adaptation to task constraintsAnnual Review of Psychology 47 273minus305

Evers A amp Lucassen W (1991) Handleiding DAT 83 DifferentieumlleAanleg Testserie [Manual DAT83 Differential Aptitude Testseries] Amsterdam Swets

Fleishman E A amp Hempel W E (1955) The relation betweenabilities and improvement with practice in a visual discriminationreaction task Journal of Experimental Psychology 49 301minus312

Flynn J R (1987) Massive IQ gains in 14 nations What IQ testsreally measure Psychological Bulletin 101 171minus191

Flynn J R (1999) Evidence against Rushton The genetic loading ofWISC-R subtests and the causes of between-group IQ differencesPersonality and Individual Differences 26 373minus379

Flynn J R (2000) IQ gains WISC subtests and fluid g g theory andthe relevance of Spearmans hypothesis to race In G R B JGoode (Ed) The nature of intelligence (pp 202minus227) New YorkWiley

Gaydon VP (1988) Predictors of performance of disadvantagedadolescents on the SowetoAlexandra gifted child programmeUnpublished M Ed dissertation University of the WitwatersrandSouth Africa

Gottfredson L S (1997) Why g matters The complexity of everydaylife Intelligence 24(1) 79minus132

Gottfredson L S (2002) g Highly general and highly practical In RJ Sternberg amp E L Grigorenko (Eds) The general intelligencefactor How general is it (pp 331minus380) Mahwah NJ Erlbaum

Grigorenko E L amp Sternberg R J (1998) Dynamic testing Psy-chological Bulletin 124 75minus111

HaeckW Yeld N Conradie J Robertson N amp Shall A (1997) Adevelopmental approach to mathematics testing for universityadmissions and course placement Educational Studies in Mathe-matics 33 71minus91

Hartmann P Kruuse NHS amp Nyborg H (in press) Testing thecross-racial generality of Spearmans hypothesis in two samplesIntelligence

Hausknecht J P Trevor C O amp Farr J L (2002) Retaking abilitytests in a selection setting Implications for practice effects trainingperformance and turnover Journal of Applied Psychology 87(2)243minus254

Hunter J E amp Schmidt F L (1990) Methods of meta-analysisLondon Sage

Hunter J E amp Schmidt F L (2004) Methods of meta-analysis (2nded) London Sage

Jensen A R (1980) Bias in mental testing London MethuenJensen A R (1985) The nature of the blackndashwhite difference on

various psychometric tests Spearmans hypothesis Behavioraland Brain Sciences 8 193minus263

Jensen A R (1998a) The g factor The science of mental abilityLondon Praeger

Jensen A R (1998b) Adoption data and two g-related hypothesesIntelligence 25 1minus6

Johnson W Bouchard T J Krueger R F Jr McGue M ampGottesman I I (2004) Just one g Consistent results from threetest batteries Intelligence 32 95minus107

Johnson W te Nijenhuis J amp Bouchard TJ Jr (in press)Replication of the hierarchical visual-perceptual-image rotationmodel in de Wolff and Buitens (1963) battery of 46 tests of mentalability Intelligence

Jones R J (1986) A comparison of the predictive validity of theMCAT for coached and uncoached students Journal of MedicalEducation 61 335minus338

Kaufman A S amp Kaufman N L (1983) K-ABC KaufmanAssessment Battery for Children Interpretive manual CirclePines MN AGS

299J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Kooij A P Rolfhus E Wilkins C Yang Z amp Zhu J (2005)WAIS-III Nederlandstalige bewerking Technisch rapport hernor-mering [WAIS-III adoptation in Dutch Technical report renorm-ing] Amsterdam Harcourt

Kort W Schittekatte M Dekker P H Verhaeghe P Compaan EL Bosmans M amp Vermeir G (2005) WISC-IIINL WechslerIntelligence Scale for Children Derde Editie NL Handleiding enverantwoording [The Dutch WISC-III Wechsler Intelligence Scalefor Children Third Edition for the Netherlands Manual andjustification] Amsterdam NIP

Kulik J A Bangert-Drowns R L amp Kulik C C (1984)Effectiveness of coaching for aptitude tests PsychologicalBulletin 95 179minus188

Kulik J A Kulik C C amp Bangert R L (1984) Effects of practiceon aptitude and achievement test scores American EducationalResearch Journal 21 435minus447

Lee K H Choi Y Y Gray J R Cho S H Chae J -H Lee S etal (2006) Neural correlates of superior intelligence Strongerrecruitment of posterior parietal cortex NeuroImage 29(2)578minus586

Linn R L (1977) On the treatment of multiple scores for LawSchool Admission Test repeaters (Report LSAC-77-4) In LawSchool Admission Council Reports of LSAC Sponsored ResearchVolume III 1975-1977 Princeton NJ Law School AdmissionCouncil

Lipson LE (1992) Relationship of static and dynamic measures toscholastic achievement of black pupils Unpublished MEddissertation University of Witwatersrand South Africa

Lloyd F amp Pidgeon D A (1961) An investigation into the effects ofcoaching on non-verbal test material with European Indian andAfrican children British Journal of Educational Psychology 31145minus151

Luteijn F amp Barelds D P H (2005) GIT2 Groninger IntelligentieTest 2 [GIT2 Groningen Intelligence Test 2] Amsterdam Harcourt

Lynn R (1990) The role of nutrition in secular increases inintelligence Personality and Individual Differences 11 273minus285

Lynn R (1998) In support of the nutrition theory In U Neisser(Ed) The rising curve Long-term gains in IQ and relatedmeasures (pp 207minus215) Washington DC American Psycholo-gical Association

Lynn R Allik J amp Irwing P (2004) Sex differences on three factorsidentified in Ravens Standard Progressive Matrices Intelligence32 411minus424

Lynn R amp Hampson S (1986) The rise of national intelligenceEvidence from Britain Japan and the USA Personality andIndividual Differences 7 23minus32

Matarazzo J D Carmody T P amp Jacobs L D (1980) Testndashretestreliability and stability of the WAIS A literature review withimplications for clinical practice Journal of Clinical Neuropsy-chology 2(2) 89minus105

McCormick BK Dunlap WP Kennedy RS amp Jones MB(1983) The effects of practice on the Armed Forces VocationalAptitude Test Battery US Army Research Institute for theBehavioral and Social Sciences Technical Report 602

Mulder J L Dekker R amp Dekker P H (2004) KaufmanIntelligentietest voor adolesecenten en volwassenen (KAIT)Handleiding [Kaufman Intelligence test for adolescents and adults(KAIT) Manual] Leiden the Netherlands PITS

Murphy R (2002) A review of South African research in the fieldof dynamic assessment Unpublished MA dissertation Universityof Pretoria (available online from httpupetdupaczathesisavailableetd-05042002-161239)

Nel A (1997)Die voorspelling van akademiese sukses binne kontekstvan n alternatiewe universiteitstoelatingsbeleid [The predictionof academic success within the context of an alternative policy ofuniversity admission] Unpublished MA dissertation RandAfrikaans University South Africa

Neubauer A C amp Freudenthaler H H (1994) Reaction time in asentence-picture verification test and intelligence Individualstrategies and effects of extended practice Intelligence 19193minus218

Nunnally J C amp Bernstein I H (1994) Psychometric theory(3rd ed) New York McGraw-Hill

Olsen M amp Schrader W B (1959) The use of preliminary and finalScholastic Aptitude Test scores in predicting college grades(College Entrance Examination Board Research and DevelopmentReports and Statistical Reports 59-19 Princeton NJ Educa-tional Testing Service

Ortar G R (1960) Improving test validity by coaching EducationalResearch 2 137minus142

Powers D E (1985) Effects of test preparation on the validity ofGraduate Admission Test Applied Psychological Measurement 9179minus190

Raven J Raven J C amp Court J H (2000) Standard ProgressiveMatrices Raven manual Section 3 Oxford Psychologists Press

Ree M J amp Carretta T R (1994) The correlation of generalcognitive ability and psychomotor tracking tests InternationalJournal of Selection and Assessment 2 209minus216

Ree M J amp Earles A A (1991) Predicting training success Notmuch more than g Personnel Psychology 44 321minus332

Ree M J Earles J A amp Teachout M S (1994) Predicting jobperformance Not much more than g Journal of AppliedPsychology 79 518minus524

Reeve C L amp Lam H (2005) The psychometric paradox of practiceeffects due to retesting Measurement invariance and stable abilityestimates in the face of observed score changes Intelligence 33535minus549

Resing W C M (1990) Intelligentie en leerpotentieel Eenonderzoek naar het leerpotentieel van jonge leerlingen uit hetbasis-en speciaal onderwijs [Intelligence and learning potential Astudy into the learning potential of young students in basic andspecial education] Amsterdam the Netherlands Swets

Rushton J P (1999) Secular gains in IQ are not related to the g factorand inbreeding depressionmdashunlike blackndashwhite differences A replyto Flynn Personality and Individual Differences 26 381minus389

Rushton J P Skuy M amp Bons T A (2004) Construct validity ofRavens Advanced Progressive Matrices for African and non-African engineering students in South Africa InternationalJournal of Selection and Assessment 12(3) 220minus229

Schmidt F L amp Hunter J E (1998) The validity and utility ofselection methods in personnel psychology Practical and theore-tical implications of 85 years of research findings PsychologicalBulletin 124(2) 262minus274

Schmidt F L amp Hunter J E (1999) Theory testing andmeasurement error Intelligence 27(3) 183minus198

Schmidt F L amp Le H (2004) Software for the Hunter-Schmidtmeta-analysis methods University of Iowa Department ofManagement and Organization IOWA City IQ 42242

Schroots J J F amp van Alphen de Veer R J (1979) LDT LeidseDiagnostische Test Deel 1 Handleiding [LDT Leiden DiagnosticTest Part 1 Manual] Lisse the Netherlands Swets

Shochet I M (1986) Manifest and potential performance inadvantaged and disadvantaged students Unpublished DPhildissertation University of the Witwatersrand South Africa

300 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Skuy M Gewer A Osrin Y Khunou D Fridjon P amp RushtonJ P (2002) Effects of mediated learning experience on RavensMatrices scores of African and non-African university studentsin South Africa Intelligence 30 221minus232

Swanson H E amp Lussier C M (2001) A selective synthesis of theexperimental literature on dynamic assessment Review of Educa-tional Research 71 321minus363

Teasdale T W amp Owen D R (1989) Continuing secular increase inintelligence and a stable prevalence of high intelligence levelsIntelligence 13 255minus262

Tuma J M amp Appelbaum A S (1980) Reliability and practiceeffects of WISC-R IQ estimates in a normal population Educa-tional and Psychological Measurement 40 671minus678

te Nijenhuis J Tolboom E Resing W amp Bleichrodt N (2004)Does cultural background influence the intellectual performance ofchildren from immigrant groups Validity of the RAKITintelligence test for immigrant children European Journal ofPsychological Assessment 20 10minus26

te Nijenhuis J amp van der Flier H (1997) Comparability of GATBscores for immigrants and majority group members Some Dutchfindings Journal of Applied Psychology 82 675minus687

te Nijenhuis J Voskuijl O F amp Schijve N B (2001) Practice andcoaching on IQ tests Quite a lot of g International Journal ofSelection and Assessment 9 302minus308

Thorndike R L (1985) The central role of general ability inprediction Multivariate Behavioral Research 20 241minus254

Tucker L R Damarin F amp Messick S (1966) A base-free measureof change Psychometrika 31(4) 457minus473

van der Doef M P Kwint J M amp van der Koppel (1989) Wat lerenmoeilijk lerende kinderen van de WISC-R [What do children whohave difficulties in learning learn from the WISC-R] Kind enAdolescent 10 136minus141

United States Department of Labor (1970) Manual for the USTESGeneral Aptitude Test Battery Section III DevelopmentWashing-ton DC United States Department of Labor

van Geffen (1972) De betrouwbaarheid van de GATB 1002-B opbrugklasniveau [The reliability of the GATB 1002 B for the firstclass at secondary school] Catholic University Nijmegen theNetherlands Psychology of Work and Organisation

van Haasen P P de Bruyn E E J Pijl Y J Poortinga Y H LutjeSpelberg H C Vander Steene G et al (1986) WISC-RWechsler Intelligence Scale for Children-Revised Nederlandsta-lige uitgave [WISC-R Wechsler Intelligence Scale for Children-Revised Dutch edition] Lisse the Netherlands Swets

Wechsler D (1955) Manual for the Wechsler Adult IntelligenceScale New York The Psychological Corporation

Wechsler D (1967)Manual for the Wechsler Preschool and PrimaryScale of Intelligence New York The Psychological Corporation

Wechsler D (1974) Manual for the Wechsler Intelligence Scale forChildren-Revised New York The Psychological Corporation

Wechsler D (1981) WAIS-R manual Wechsler Adult IntelligenceScale-Revised New York The Psychological Corporation

Wechsler D (1997)WAIS-III Wechsler Adult Intelligence Scale-thirdedition and WMS-III Wechsler Memory Scale-third editionTechnical manual New York The Psychological Corporation

Wicherts J W Dolan C V Oosterveld P van Baal G C VBoomsma D I amp Span M M (2004) Are intelligence testsmeasurement invariant over time Investigating the nature of theFlynn effect Intelligence 32(5) 509minus537

Yeld N amp Haeck W (1997) Educational histories and academicpotential Can tests deliver Assessment and Evaluation in HigherEducation 22 5minus16

Zaaiman H (1998) Selecting students for Mathematics and ScienceThe challenge facing higher education in South Africa SouthAfrica Pretoria HSRC Publishers

Zaaiman H van der Flier H amp Thijs G D (2001) Dynamic testingin selection for an educational programme Assessing SouthAfrican performance on the Raven Progressive Matrices Inter-national Journal of Selection and Assessment 9 258minus269

Zolezzi S A (1992) Alternative selection measures for universityundergraduate admissions Unpublished MEd dissertation Uni-versity of the Witwatersrand South Africa

Zolezzi S A (1995) The effectiveness of dynamic assessment as analternative aptitude testing strategy Unpublished DPhil disserta-tion University of South Africa South Africa

Table 3Proportion of sample selecting the correct answer on items of Ravens Standard Progressive Matrices by group

Set A Set B Set C Set D Set E

Item Black Other a Item Black Other Item Black Other Item Black Other Item Black Other

1 100 100 13 100 100 25 100 97 37 100 100 49 74 902 97 100 14 100 100 26 96 100 38 99 100 50 64 903 97 100 15 100 100 27 96 100 39 89 100 51 79 974 100 97 16 91 97 28 86 93 40 92 100 52 56 835 100 100 17 96 97 29 94 97 41 96 100 53 52 836 99 100 18 85 100 30 76 83 42 92 100 54 35 767 94 97 19 77 66 31 88 97 43 77 100 55 42 798 91 93 20 79 97 32 50 79 44 76 93 56 21 699 100 97 21 83 97 33 74 90 45 71 97 57 30 4110 91 97 22 92 100 34 61 79 46 79 93 58 12 4111 83 90 23 80 90 35 53 69 47 29 41 59 02 1712 68 83 24 59 83 36 06 35 48 26 38 60 11 21a Other=White Indian and Colored

293J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

negatively correlated with pretest scores as a function ofunreliability (see Cronbach 1990 Nunnally amp Bern-stein 1994) we corrected the correlations using TuckerDamarin and Messicks (1966) formula 63 Using theformula one adds to each correlation the term (SDpretestSD gain score) (1minus reliability pretest)

16 Results

161 Descriptive statistics

Internal consistencies (Cronbach αs) on the RSPMranged from 76 to 86 for the pre- and posttestsrespectively Table 3 shows the proportion of each of thegroups which selected the correct answer on each of the60 items of the pretest Across the 60 items the order ofthe p values was almost identical for Blacks and WhiteIndianColoreds (r=92 p=00)

Table 4 shows the means and standard deviations forthe total RSPM scores for the four groups along withthe d effect sizes representing the difference betweenpre- and posttest scores (Cohen 1988) First we

Table 4Pre- and posttest mean ravens scores standard deviations and mean effect

Black experimental(n=40)

Black control (n=26)

Pretest Posttest Pretest Postte

Raw scoresM 4378 5010 4546 4835SD 664 531 669 671Percentile 14 41 16 31Effect size 095 043

Percentiles are based on US adult norms see Raven Raven and Courts (2a Other=White Indian and Colored

examined whether there was an effect of race (Blackvs WhiteIndianColored) and group (experimental vscontrol) on the pretest scores There was a significanteffect due to race (F(1 91)=2413 p=00 η2 = 21)but not group (F(1 91)=228 p= 14 η2 = 02) Thismeans that mean pretest scores of Blacks (M=4444 SD=665) were lower than those of WhiteIndianColoreds (M=5141 SD=505) and that mean pretestscores of experimental and control groups werecomparable (M=4553 SD=704 and M=48 SD=67 respectively)

Secondly we investigated the effects of training onthe posttest scores by performing a two-way ANCOVAon the total posttest scores with race and group as factorsand the total pretest scores as the covariate There was asignificant effect for group (F(1 95)=1381 p=00η2 = 13) and for race (F(1 90)=399 p=05 η2 = 04)but not for the two-way interaction of group and race (F(1 90)=028 p= 60 η2 = 00) These results indicatethat the training was equally effective for both the Blackand WhiteIndianColored students Posttest scores ofBlacks (M=4941 SD=591) however remained

sizes for Black and WhiteIndianColored students

Other a experimental(n=15)

Other control (n=14)

st Pretest Posttest Pretest Posttest

5020 5580 5271 5536605 376 345 34341 75 55 68093 077

000) Table SPM13

294 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

significantly lower (F(1 91)=2833 p=00) than thoseof WhitesIndiansColoreds (M=5559 SD=355)Although posttest scores of the experimental group(M=5165 SD=553) were higher than those of thecontrol group (M=508 SD = 665) differencesbetween both groups were nonsignificant (F(1 91)=085 p=36)

The correlation between scores before and after thetraining was 84 (p=00) for the experimental group and90 (p=00) for the control group showing that thetraining had only limited effect on the rank order ofindividuals scores This means that the test strongly butnot perfectly measures the same constructs on bothoccasions

162 Correlation between score gains and g loadedness

We estimated effect sizes for each of the four groups(race by condition) by computing the differencebetween mean pretest scores and posttest scores dividedby the standard deviation of the pretest scores of Blackand WhiteIndianColored students respectivelyFinally we calculated the correlations between effectsizes and the g loadings taken from Lynn et alCorrelations were minus 24 (p=10) for the Black experi-mental group minus 21 (p=20) for the WhiteIndianColored experimental group minus 08 (p=59) for theBlack control group and minus 41 (p=01) for the WhiteIndianColored control group Small sample sizesusually attenuate correlations (Hunter amp Schmidt1990) Collapsing the groups indeed resulted in higheraverage correlations minus 39 for the complete experimen-tal group and minus 26 for the complete control group

163 g loadings

Using the combined experimental and controlgroup a principle axis factor analysis on the pretestand posttest scores respectively resulted in a firstunrotated factor explaining 22 of the variance in thepretest scores and 18 of the variance in the posttestscores These findings suggest that the g loadedness ofthe RSPM decreased substantially after MediatedLearning Experience

164 Correlation between score gains and sum score

Correlating score gains with RSPM total scoresresulted in values of minus 60 (p=00) for the Blackexperimental group minus 18 (p=38) for the Black controlgroup minus 82 (p= 00) for the WhiteIndianColoredexperimental group and minus 48 (p=08) for the White

IndianColored control group After the use of thecorrection formula of Tucker et al (1966) thesecorrelations became minus 39 minus 08 minus 61 and minus 35respectively Overall these correlations show that low-g persons improved their scores more strongly thanhigh-g persons

17 Discussion

Skuy et al (2002) hypothesized that the low-qualityeducation of Blacks in South Africa would lead to anunderestimate of their cognitive abilities by IQ testsGroups of Black and WhiteIndianColored studentstook the Ravens Progressive Matrices twice and inbetween received Feuersteins Mediated LearningExperience The test scores went up substantially in allgroups Evidence for an authentic change in the g factorrequires broad transfer or generalizability across a widevariety of cognitive performance However Skuy et alshow that the gains did not generalize to scores on another highly similar test and to external criteria andwere therefore hollow As the score gains were in somecases quite largendash14 IQ points for the Black experi-mental groupndashthe question becomes what is it thatimproved

The findings show that the correlations betweenscore gains and g loadedness of the items were minus 39 forthe complete experimental group and minus 26 for thecomplete control group However because the gloadings and gain scores are measured at the itemlevel their reliabilities are not high resulting insubstantial attenuation of the correlation between gand d Moreover RSPM does not measure g perfectlyJensen (1998a p 91) estimates its g loading at 83When we estimate the reliability of the g vector at 70and the reliability of the gain score vector at 50corrections for unreliability and deviation from perfectconstruct validity of g only would result in estimatedtrue correlations of respectively minus 80 and minus 53 Thesevalues should be taken as underestimates controllingfor additional artifacts will bring them closer to the verystrong negative correlation found in the meta-analysis

The findings suggest that after training the gloadedness of the test decreased substantially Wefound negative substantial correlations between gainscores and RSPM total scores Table 4 shows that thetotal score variance decreased after training which is inline with low-g subjects increasing more than high-gsubjects Since as a rule high-g individuals profit themost from trainingndashas is reflected in the ubiquitouspositive correlation between IQ scores and trainingperformance (Jensen 1980 Schmidt amp Hunter 1998)ndash

295J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

these findings could be interpreted as an indication thatFeuersteins Mediated Learning Experience is not g-loaded in contrast with regular trainings that are clearlyg-loaded Substantial negative correlations betweengain scores and RSPM total scores are no definite proofof this hypothesis but are in line with it Additionalsubstantiation of our hypothesis that the Feuersteintraining has no or little g loadedness is that Coyle (2006)showed that gain scores loaded virtually zero on the gfactor Moreover Skuy et al reported that the predictivevalidity of their measure did not increase when thesecond Raven score was used The fact that individualswith low-g gained more than those with high-g could beinterpreted as an indication that the Mediated LearningExperience was not g-loaded It should be notedhowever that Feuerstein most likely did not intend hisintervention to be g-loaded He was interested inincreasing the performance of low scorers on bothtests and external criteria

18 General discussion

IQ scores are by far the best general predictor ofsuccess in education job training and work Howeverthere are many ways in which these IQ scores can beincreased for instance by means of retesting orparticipating in a learning potential training programWhat conclusions can be drawn from such score gainsJensens (1998a) hypothesis that the effects of trainingon abilities can be summarized in terms of Carrollsthree-stratum hierarchical factor model was tested in ameta-analysis on testndashretest data using Dutch Britishand American test batteries and with learning potentialdata from South Africa using Ravens ProgressiveMatrices The meta-analysis convincingly shows thattestndashretest score gains are not g-loaded The findingsfrom the learning potential study are clearly in line withthis when the attenuation caused by unreliability andother artifacts is taken into account the correlationbetween g loadings of items and gains on items has avalue that is somewhat comparable to the one found inthe meta-analysis for test batteries The data suggest thatthe g loadedness of item scores decreases after theintervention training Te Nijenhuis et als (2001)finding that practice and coaching reduced the g-loadedness of their test scores strengthens the presentfindings using item scores The findings show that notthe high-g participants increase their scores the mostndashasis common in training situationsndashbut it is the low-gpersons showing the largest increases of their scoresThis suggests that the intervention training is not g-loaded

Our findings fit quite well with the hierarchical modelof intelligence The generalizability of test scores residespredominantly in the g component whereas the test-specific ability component and the narrow abilitycomponent are virtually non-generalizable This is forinstance evidenced by the earlier finding that addingverbal tests to a g score or numerical tests to a g scoreresulted in only a very small incremental validity (Ree ampEarles 1991 Ree et al 1994) Additionally Ericssonand Lehmann (1996) reported immense gains for amemory task focusing on one narrow ability but did notfind any improvement for comparable memory tasksfocusing on another narrow ability As the score gains arenot related to g the generalizable g componentdecreases and since it is not unlikely that the Feuersteintraining itself is not g-loaded it is easy to understand whythe score gains did not generalize to scores on thecognitively loaded Representational Stencil Design TestFor a similar reason the score gains did not generalize tog-loaded external criteria as the correlation of the RSPMscores with performance in the end-of-year psychologyexamination did not significantly improve after media-tion Reeve and Lam (2005) claimed that retesting doesnot change the nature of what is being tested but ourfindings suggest the opposite

19 Limitations of the studies

Our meta-analysis and our analysis of the SouthAfrican study are strongly based on the method ofcorrelated vectors (MCV) and recently it has been shownto have limitations Dolan and Lubke (2001) have shownthat when comparing groups substantial positive vectorcorrelations can still be obtained even when groups differnot only on g but also on factors uncorrelated with gAshton and Lee (2005) show that associations of avariable with non-g sources of variance can produce avector correlation of zero even when the variable isstrongly associated with g They suggest that the gloadings of a subtest are sensitive to the nature of the othersubtest in a battery so that a specific sample of subtestsmay cause a spurious correlation between the vectorsNotwithstanding these limitations studies using MCVcontinue to appear (see for instance Colom Haier ampJung in pressHartmannKruuseampNyborg in press Leeet al 2006) The outcomes of our meta-analysis of a largenumber of studies using the method of correlated vectorsmay make an interesting contribution to the discussion onthe limitations of the method of correlated vectors

A principle of meta-analysis is that the amount ofinformation contained in one individual study is quitemodest Therefore one should carry out an analysis of

296 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

all studies on one topic and correct for artifacts leadingto a strong increase of the amount of information Thefact that our meta-analytical value of r=minus106 isvirtually identical to the theoretically expected correla-tion between g and d of minus100 holds some promise thata psychometric meta-analysis of studies using MCV is apowerful way of reducing some of the limitations ofMCV An alternative methodological approach is tolimit oneself to the rare datasets enabling the use ofstructural equations modeling However from a meta-analytical point of view these studies yield only a quitemodest amount of information

Additional meta-analyses of studies employing MCVare necessary to establish the validity of the combinationof MCV and psychometric meta-analysis Most likelymany would agree that a high positive meta-analyticalcorrelation between measures of g and measures ofanother construct implies that g plays a major role andthat a meta-analytical correlation of minus100 implies that gplays no role However it is not clear what value of themeta-analytical correlation to expect from MCV when gplays only a modest role After the present meta-analysison a construct that clearly has an inverse relationshipwith g it would be informative to carry out meta-analyses of studies on variables that are strongly linkedto g and variables that are modestly linked to g Anexample of the latter would be secular score gainswhich according to Lynns (1990) nutrition theoryshould be modestly g-loaded

The sample sizes in the South African study are notlarge but still larger than those in many other studies oflearning potential where an Nasymp10 is not unusual Theresults of a reanalysis of the many existing studies ondynamic testing could lead to a meta-analysis with alarge combined N The mean posttest score was quitehigh so a ceiling effect may have taken place for theWhiteIndianColored group leading to an underestima-tion of the experimental score gain for this group

Instead of testing the hypothesis with a stronglyunidimensional test such as the RSPM it would be betterto use a multidimensional test Moreover a large samplesize would allow the use of more rigorous data-analyticaltechniques leading to more definitive results Howeverto the best of our knowledge datasets meeting theserequirements do not exist and the Skuy et al study isarguably the best South African learning potential study

20 Score gains as low-quality measuresof motivation

As criterion-related validity is strongly dependent ong te Nijenhuis et als finding of lowered g loadings

after training should result in lowered criterion-relatedvalidity However the empirical findings show theopposite virtually all testndashretest and test preparationstudies on cognitive tests and scholastic aptitude tests thatreported both criterion-related validities demonstratesmall to modest increases in criterion-related validity forthe second or third test score (see Allalouf amp Ben-Shakhar 1998 Bashi 1976 Coyle 2006 HausknechtTrevor amp Farr 2002 Jones 1986 Linn 1977 Olsen ampSchrader 1959 Ortar 1960 Powers 1985 Reeve ampLam 2005) In the carefully designed study by Allaloufand Ben-Shakhar (1998) of a university entrance test theexperimental group received an intensive 40-h testcoaching program while the control group did not Thecriterion-related validity for the retest increased for bothgroups Most importantly the increase was the samemdashitwas not larger for the experimental group

In a little-known but carefully designed large-scalelearning potential study by Resing (1990 see Table423) she compared an experimental group thatreceived a pretest a learning potential training and aposttest against a control group that received only thepretest and the posttest The mean criterion-relatedvalidity of the various second scores was 62 for both theexperimental and the control group Learning potentialtraining did not result in incremental criterion-relatedvalidity over and above the validity resulting fromsimply retesting The findings from both Resing andAllalouf and Ben-Shakhar suggest that cognitiveinterventions do not increase criterion-related validitymore than simple retesting

g and the personality measure conscientiousness havebeen shown to make an excellent combination ofpredictors (Schmidt amp Hunter 1998) Conscientiousnessrepresents among other characteristics persistence a willto achieve and the ability to focus effort on the goal Afield study on test preparation using actual job applicants(Clause Delbridge Schmitt Chan amp Jennings 2001)showed that motivation to perform well on the testcorrelated 25 with test performance One could speculatethat score increases do not reflect a true cognitivecomponent but rather become low-quality measures ofmotivation Further since the increase in validity due toretesting and learning potential training is modest incomparison to the large increase obtainable from the useof personality questionnaires personality testing mightprovide a less expensive and more accurate alternative

21 Effectiveness of various training formats

Components of the mediation training used by Skuyet al (2002) are similar to the test training used in te

297J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Nijenhuis et al (2001) Both the Dutch training and theSouth African training took 3 h but whereas in theDutch training the focus was on two different testformats the South African training dealt only with onetest format The test training by Lloyd and Pidgeon(1961) took even less time namely two half-hoursegments each focusing on one test format The effectsizes in all studies were roughly comparable Thissuggests that the methodologies employed by teNijenhuis et al and Lloyd and Pidgeon were moreefficient than those used by Skuy et al It is possible thatthe components of the mediation training that are notpresent in the other two training formats are not effectivein raising test scores and could therefore be left out Iftrue it might be possible to increase the scores on theRSPM by one SD with a relatively simple 1-h training

22 Generalizability of findings

Can these findings of hollow score gains after testndashretest test practice and Mediated Learning ExperienceTraining be generalized to other studies where training-induced score gains were found Ericsson and Lehmann(1996) reported tremendous score increases afterintensive training on numeric memory tests but thesegains did not generalize in the least to verbal memorytests Such gains on one narrow ability do not generalizeto another narrow ability clustering under the samebroad ability and are therefore hollow Similarly Jensen(1998b) showed that score gains due to adoption werenot on the g factor and were therefore most likelyhollow

Rushton (1999) argued that intergenerational scoregains are not linked to g suggesting the Flynn effectsmay be empty but he was strongly criticized by Flynn(1999 2000) In studies on the Flynn effect score gainsfound in cross-sectional studies are largest on the RSPM(Flynn 1987) It has been suggested by Lynn (1998) thata substantial part of these intergenerational score gainson the RSPM are generalizablendashthey do reflect highergndashbut the remaining part is hollow and should beinterpreted as schooling effects The RSPM does requirethe application of the mathematical principles ofaddition subtraction progression and the distributionof values In the three decades (1950sndash1980s) overwhich these increases in RSPM scores have occurredincreasing proportions of 15- to 18-year-olds haveremained in schools where they have learned mathskills that they have applied to the solution of matricesproblems Our findings could be interpreted as supportfor Lynns hypothesis of the partial hollowness of scoregains on the RSPM Notwithstanding the high g loading

of the sum score of the RSPM it is quite sensitive totestndashretest effects and training effects Some studies onthe Flynn effect (Lynn amp Hampson 1986 Teasdale ampOwen 1989) show that the increase in scores is largelyconcentrated in the lower segments of the IQ distribu-tion Our finding that low scorers show the largest gainsafter training may additionally support the notion that apart of the Flynn effect on the RSPM is hollow FinallyWicherts et als (2004) findings show that in some oftheir datasets the secular score gains are most stronglylinked to broad- narrow- and test-specific abilitiesshowing that an important part of the gains are non-generalizable

Ceci (1991) showed that increased schooling leads tohigher IQ scores but are these gains highly specific orpredominantly generalizable It would be interesting toapply the techniques we used in this study to thefindings from previous intervention studies It may bethat biological interventions (such as diet vitaminsupplements vaccination against infectious disease)rather than psychological or educational interventionsare the most cost-effective method of producing truechanges in g and broad abilities It may be that there is abiological barrier between the first stratum and thesecond stratum that restricts the effects of behavioralinterventions to narrow abilities and test specificities

Acknowledgement

We like to thank Mervyn Skuy for his permission touse his dataset

Thanks to Marieacute de Beer Raegan Murphy WelkoTomic Art Jensen and Frank Schmidt for feedback onprevious versions of this paper

Thanks to Arne Evers Wilma Resing (Dutch TestCommittee) and Andress Kooij (Harcourt) for alsohelping in locating testndashretest studies

References

Ackerman P L (1986) Individual differences in informationprocessing An investigation of intellectual abilities Intelligence10 101minus139

Ackerman P L (1987) Individual differences in skill learning Anintegration of psychometric and information processing skillsPsychological Bulletin 102 3minus27

Allalouf A amp Ben-Shakhar G (1998) The effect of coaching on thepredictive validity of scholastic aptitude tests Journal ofEducational Measurement 35(1) 31minus47

Ashton M C amp Lee K (2005) Problems with the method ofcorrelated vectors Intelligence 33 431minus444

Bashi Y (1976) Verbal and non-verbal abilities of 4th 6th and 8thgrade students in the Arab educational system in Israel JerusalemHebrew University School of Education

298 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Bleichrodt N Resing W C M Drenth P J D amp Zaal J N (1987)Intelligentie-meting bij kinderen Empirische en methodologischeverantwoording van de geReviseerde Amsterdamse Kinder Intelli-gentie Test [Measuring the intelligence of children Empirical andmethodological justification of the Revised Amsterdam ChildrenIntelligence Test] Lisse the Netherlands Swets

Bennett G K Seashore H G ampWesman A G (1974)DifferentialAptitude Tests (5th ed) Manual New York The PsychologicalCorporation

Boeyens J C A (1989) Learning potential An empiricalinvestigation Pretoria South Africa Human Science ResearchCouncil

Bosch F (1973) Inventarisatie beschrijving en onderzoek mbt dewijzigingen van de GATB incl test-hertest onderzoek (NoPz3bRp0120) [Stock-taking description and research concern-ing the modifications of the GATB includes testndashretest study]Utrecht the Netherlands Nederlandse Spoorwegen

Carroll J B (1993) Human cognitive abilities A survey of factoranalysis studies Cambridge University Press

Ceci S J (1991) How much does schooling influence generalintelligence and its cognitive components A reassessment of theevidence Developmental Psychology 27 703minus722

Christian K Bachnan H J amp Morrison F J (2001) Schooling andcognitive development In R J Sternberg amp E L Grigorenko(Eds) Environmental effects on cognitive abilities (pp 287minus335)Mahwah NJ Erlbaum

Clause C S Delbridge K Schmitt N Chan D amp Jennings D(2001) Test preparation activities and employment test perfor-mance Human Performance 14 149minus167

Cohen J (1988) Statistical power analysis for the behavioralsciences Hillsdale Lawrence Erlbaum

Colom R Jung R E amp Haier R J (in press) Finding the g-factor inbrain structure using the method of correlated vectors Intelligence

Covin T A (1977) Stability of the WISC-R for 9-year-olds withlearning difficulties Psychological Reports 40 1297minus1298

Coyle T R (2006) Testndashretest changes on scholastic aptitude tests arenot related to g Intelligence 34 15minus27

Cronbach L J (1990) Essentials of psychological testing New YorkHarperCollins

de Villiers AB (1999) Disadvantaged students academic perfor-mance Analysing the zone proximal developmentUnpublished DPhil thesis University of Cape Town South Africa

de Wolff C J amp Buiten B (1963) Een factoranalyse van viertestbatterijen [A factor analysis of four test batteries] NederlandsTijdschrift Voor Psychologie 18 220minus239

Dolan C V amp Lubke G (2001) Viewing Spearmans hypothesisfrom the perspective of multigroup PCA A comment onSchonemanns criticism Intelligence 29 231minus245

Drenth P J D Petrie J F amp Bleichrodt N (1968) Handleiding bijde Amsterdamse Kinder Intelligentie Test [Manual of theAmsterdam Children Intelligence Test] Amsterdam VrijeUniversiteit

Elliott C D (1983) British Ability Scales Manual 2 TechnicalHandbook Windsor Great-Britain NFER-Nelson

Engelbrecht M (1999) Leerpotensiaal as voorspeller van akademi-ese sukses van universiteitsstudente [Learning potential aspredictor of the academic success of university students]Unpublished D Phil thesis Potchefstroom University forChristian Higher Education South Africa

Ericsson K A amp Lehmann A C (1996) Expert and exceptionalperformance Evidence of maximal adaptation to task constraintsAnnual Review of Psychology 47 273minus305

Evers A amp Lucassen W (1991) Handleiding DAT 83 DifferentieumlleAanleg Testserie [Manual DAT83 Differential Aptitude Testseries] Amsterdam Swets

Fleishman E A amp Hempel W E (1955) The relation betweenabilities and improvement with practice in a visual discriminationreaction task Journal of Experimental Psychology 49 301minus312

Flynn J R (1987) Massive IQ gains in 14 nations What IQ testsreally measure Psychological Bulletin 101 171minus191

Flynn J R (1999) Evidence against Rushton The genetic loading ofWISC-R subtests and the causes of between-group IQ differencesPersonality and Individual Differences 26 373minus379

Flynn J R (2000) IQ gains WISC subtests and fluid g g theory andthe relevance of Spearmans hypothesis to race In G R B JGoode (Ed) The nature of intelligence (pp 202minus227) New YorkWiley

Gaydon VP (1988) Predictors of performance of disadvantagedadolescents on the SowetoAlexandra gifted child programmeUnpublished M Ed dissertation University of the WitwatersrandSouth Africa

Gottfredson L S (1997) Why g matters The complexity of everydaylife Intelligence 24(1) 79minus132

Gottfredson L S (2002) g Highly general and highly practical In RJ Sternberg amp E L Grigorenko (Eds) The general intelligencefactor How general is it (pp 331minus380) Mahwah NJ Erlbaum

Grigorenko E L amp Sternberg R J (1998) Dynamic testing Psy-chological Bulletin 124 75minus111

HaeckW Yeld N Conradie J Robertson N amp Shall A (1997) Adevelopmental approach to mathematics testing for universityadmissions and course placement Educational Studies in Mathe-matics 33 71minus91

Hartmann P Kruuse NHS amp Nyborg H (in press) Testing thecross-racial generality of Spearmans hypothesis in two samplesIntelligence

Hausknecht J P Trevor C O amp Farr J L (2002) Retaking abilitytests in a selection setting Implications for practice effects trainingperformance and turnover Journal of Applied Psychology 87(2)243minus254

Hunter J E amp Schmidt F L (1990) Methods of meta-analysisLondon Sage

Hunter J E amp Schmidt F L (2004) Methods of meta-analysis (2nded) London Sage

Jensen A R (1980) Bias in mental testing London MethuenJensen A R (1985) The nature of the blackndashwhite difference on

various psychometric tests Spearmans hypothesis Behavioraland Brain Sciences 8 193minus263

Jensen A R (1998a) The g factor The science of mental abilityLondon Praeger

Jensen A R (1998b) Adoption data and two g-related hypothesesIntelligence 25 1minus6

Johnson W Bouchard T J Krueger R F Jr McGue M ampGottesman I I (2004) Just one g Consistent results from threetest batteries Intelligence 32 95minus107

Johnson W te Nijenhuis J amp Bouchard TJ Jr (in press)Replication of the hierarchical visual-perceptual-image rotationmodel in de Wolff and Buitens (1963) battery of 46 tests of mentalability Intelligence

Jones R J (1986) A comparison of the predictive validity of theMCAT for coached and uncoached students Journal of MedicalEducation 61 335minus338

Kaufman A S amp Kaufman N L (1983) K-ABC KaufmanAssessment Battery for Children Interpretive manual CirclePines MN AGS

299J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Kooij A P Rolfhus E Wilkins C Yang Z amp Zhu J (2005)WAIS-III Nederlandstalige bewerking Technisch rapport hernor-mering [WAIS-III adoptation in Dutch Technical report renorm-ing] Amsterdam Harcourt

Kort W Schittekatte M Dekker P H Verhaeghe P Compaan EL Bosmans M amp Vermeir G (2005) WISC-IIINL WechslerIntelligence Scale for Children Derde Editie NL Handleiding enverantwoording [The Dutch WISC-III Wechsler Intelligence Scalefor Children Third Edition for the Netherlands Manual andjustification] Amsterdam NIP

Kulik J A Bangert-Drowns R L amp Kulik C C (1984)Effectiveness of coaching for aptitude tests PsychologicalBulletin 95 179minus188

Kulik J A Kulik C C amp Bangert R L (1984) Effects of practiceon aptitude and achievement test scores American EducationalResearch Journal 21 435minus447

Lee K H Choi Y Y Gray J R Cho S H Chae J -H Lee S etal (2006) Neural correlates of superior intelligence Strongerrecruitment of posterior parietal cortex NeuroImage 29(2)578minus586

Linn R L (1977) On the treatment of multiple scores for LawSchool Admission Test repeaters (Report LSAC-77-4) In LawSchool Admission Council Reports of LSAC Sponsored ResearchVolume III 1975-1977 Princeton NJ Law School AdmissionCouncil

Lipson LE (1992) Relationship of static and dynamic measures toscholastic achievement of black pupils Unpublished MEddissertation University of Witwatersrand South Africa

Lloyd F amp Pidgeon D A (1961) An investigation into the effects ofcoaching on non-verbal test material with European Indian andAfrican children British Journal of Educational Psychology 31145minus151

Luteijn F amp Barelds D P H (2005) GIT2 Groninger IntelligentieTest 2 [GIT2 Groningen Intelligence Test 2] Amsterdam Harcourt

Lynn R (1990) The role of nutrition in secular increases inintelligence Personality and Individual Differences 11 273minus285

Lynn R (1998) In support of the nutrition theory In U Neisser(Ed) The rising curve Long-term gains in IQ and relatedmeasures (pp 207minus215) Washington DC American Psycholo-gical Association

Lynn R Allik J amp Irwing P (2004) Sex differences on three factorsidentified in Ravens Standard Progressive Matrices Intelligence32 411minus424

Lynn R amp Hampson S (1986) The rise of national intelligenceEvidence from Britain Japan and the USA Personality andIndividual Differences 7 23minus32

Matarazzo J D Carmody T P amp Jacobs L D (1980) Testndashretestreliability and stability of the WAIS A literature review withimplications for clinical practice Journal of Clinical Neuropsy-chology 2(2) 89minus105

McCormick BK Dunlap WP Kennedy RS amp Jones MB(1983) The effects of practice on the Armed Forces VocationalAptitude Test Battery US Army Research Institute for theBehavioral and Social Sciences Technical Report 602

Mulder J L Dekker R amp Dekker P H (2004) KaufmanIntelligentietest voor adolesecenten en volwassenen (KAIT)Handleiding [Kaufman Intelligence test for adolescents and adults(KAIT) Manual] Leiden the Netherlands PITS

Murphy R (2002) A review of South African research in the fieldof dynamic assessment Unpublished MA dissertation Universityof Pretoria (available online from httpupetdupaczathesisavailableetd-05042002-161239)

Nel A (1997)Die voorspelling van akademiese sukses binne kontekstvan n alternatiewe universiteitstoelatingsbeleid [The predictionof academic success within the context of an alternative policy ofuniversity admission] Unpublished MA dissertation RandAfrikaans University South Africa

Neubauer A C amp Freudenthaler H H (1994) Reaction time in asentence-picture verification test and intelligence Individualstrategies and effects of extended practice Intelligence 19193minus218

Nunnally J C amp Bernstein I H (1994) Psychometric theory(3rd ed) New York McGraw-Hill

Olsen M amp Schrader W B (1959) The use of preliminary and finalScholastic Aptitude Test scores in predicting college grades(College Entrance Examination Board Research and DevelopmentReports and Statistical Reports 59-19 Princeton NJ Educa-tional Testing Service

Ortar G R (1960) Improving test validity by coaching EducationalResearch 2 137minus142

Powers D E (1985) Effects of test preparation on the validity ofGraduate Admission Test Applied Psychological Measurement 9179minus190

Raven J Raven J C amp Court J H (2000) Standard ProgressiveMatrices Raven manual Section 3 Oxford Psychologists Press

Ree M J amp Carretta T R (1994) The correlation of generalcognitive ability and psychomotor tracking tests InternationalJournal of Selection and Assessment 2 209minus216

Ree M J amp Earles A A (1991) Predicting training success Notmuch more than g Personnel Psychology 44 321minus332

Ree M J Earles J A amp Teachout M S (1994) Predicting jobperformance Not much more than g Journal of AppliedPsychology 79 518minus524

Reeve C L amp Lam H (2005) The psychometric paradox of practiceeffects due to retesting Measurement invariance and stable abilityestimates in the face of observed score changes Intelligence 33535minus549

Resing W C M (1990) Intelligentie en leerpotentieel Eenonderzoek naar het leerpotentieel van jonge leerlingen uit hetbasis-en speciaal onderwijs [Intelligence and learning potential Astudy into the learning potential of young students in basic andspecial education] Amsterdam the Netherlands Swets

Rushton J P (1999) Secular gains in IQ are not related to the g factorand inbreeding depressionmdashunlike blackndashwhite differences A replyto Flynn Personality and Individual Differences 26 381minus389

Rushton J P Skuy M amp Bons T A (2004) Construct validity ofRavens Advanced Progressive Matrices for African and non-African engineering students in South Africa InternationalJournal of Selection and Assessment 12(3) 220minus229

Schmidt F L amp Hunter J E (1998) The validity and utility ofselection methods in personnel psychology Practical and theore-tical implications of 85 years of research findings PsychologicalBulletin 124(2) 262minus274

Schmidt F L amp Hunter J E (1999) Theory testing andmeasurement error Intelligence 27(3) 183minus198

Schmidt F L amp Le H (2004) Software for the Hunter-Schmidtmeta-analysis methods University of Iowa Department ofManagement and Organization IOWA City IQ 42242

Schroots J J F amp van Alphen de Veer R J (1979) LDT LeidseDiagnostische Test Deel 1 Handleiding [LDT Leiden DiagnosticTest Part 1 Manual] Lisse the Netherlands Swets

Shochet I M (1986) Manifest and potential performance inadvantaged and disadvantaged students Unpublished DPhildissertation University of the Witwatersrand South Africa

300 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Skuy M Gewer A Osrin Y Khunou D Fridjon P amp RushtonJ P (2002) Effects of mediated learning experience on RavensMatrices scores of African and non-African university studentsin South Africa Intelligence 30 221minus232

Swanson H E amp Lussier C M (2001) A selective synthesis of theexperimental literature on dynamic assessment Review of Educa-tional Research 71 321minus363

Teasdale T W amp Owen D R (1989) Continuing secular increase inintelligence and a stable prevalence of high intelligence levelsIntelligence 13 255minus262

Tuma J M amp Appelbaum A S (1980) Reliability and practiceeffects of WISC-R IQ estimates in a normal population Educa-tional and Psychological Measurement 40 671minus678

te Nijenhuis J Tolboom E Resing W amp Bleichrodt N (2004)Does cultural background influence the intellectual performance ofchildren from immigrant groups Validity of the RAKITintelligence test for immigrant children European Journal ofPsychological Assessment 20 10minus26

te Nijenhuis J amp van der Flier H (1997) Comparability of GATBscores for immigrants and majority group members Some Dutchfindings Journal of Applied Psychology 82 675minus687

te Nijenhuis J Voskuijl O F amp Schijve N B (2001) Practice andcoaching on IQ tests Quite a lot of g International Journal ofSelection and Assessment 9 302minus308

Thorndike R L (1985) The central role of general ability inprediction Multivariate Behavioral Research 20 241minus254

Tucker L R Damarin F amp Messick S (1966) A base-free measureof change Psychometrika 31(4) 457minus473

van der Doef M P Kwint J M amp van der Koppel (1989) Wat lerenmoeilijk lerende kinderen van de WISC-R [What do children whohave difficulties in learning learn from the WISC-R] Kind enAdolescent 10 136minus141

United States Department of Labor (1970) Manual for the USTESGeneral Aptitude Test Battery Section III DevelopmentWashing-ton DC United States Department of Labor

van Geffen (1972) De betrouwbaarheid van de GATB 1002-B opbrugklasniveau [The reliability of the GATB 1002 B for the firstclass at secondary school] Catholic University Nijmegen theNetherlands Psychology of Work and Organisation

van Haasen P P de Bruyn E E J Pijl Y J Poortinga Y H LutjeSpelberg H C Vander Steene G et al (1986) WISC-RWechsler Intelligence Scale for Children-Revised Nederlandsta-lige uitgave [WISC-R Wechsler Intelligence Scale for Children-Revised Dutch edition] Lisse the Netherlands Swets

Wechsler D (1955) Manual for the Wechsler Adult IntelligenceScale New York The Psychological Corporation

Wechsler D (1967)Manual for the Wechsler Preschool and PrimaryScale of Intelligence New York The Psychological Corporation

Wechsler D (1974) Manual for the Wechsler Intelligence Scale forChildren-Revised New York The Psychological Corporation

Wechsler D (1981) WAIS-R manual Wechsler Adult IntelligenceScale-Revised New York The Psychological Corporation

Wechsler D (1997)WAIS-III Wechsler Adult Intelligence Scale-thirdedition and WMS-III Wechsler Memory Scale-third editionTechnical manual New York The Psychological Corporation

Wicherts J W Dolan C V Oosterveld P van Baal G C VBoomsma D I amp Span M M (2004) Are intelligence testsmeasurement invariant over time Investigating the nature of theFlynn effect Intelligence 32(5) 509minus537

Yeld N amp Haeck W (1997) Educational histories and academicpotential Can tests deliver Assessment and Evaluation in HigherEducation 22 5minus16

Zaaiman H (1998) Selecting students for Mathematics and ScienceThe challenge facing higher education in South Africa SouthAfrica Pretoria HSRC Publishers

Zaaiman H van der Flier H amp Thijs G D (2001) Dynamic testingin selection for an educational programme Assessing SouthAfrican performance on the Raven Progressive Matrices Inter-national Journal of Selection and Assessment 9 258minus269

Zolezzi S A (1992) Alternative selection measures for universityundergraduate admissions Unpublished MEd dissertation Uni-versity of the Witwatersrand South Africa

Zolezzi S A (1995) The effectiveness of dynamic assessment as analternative aptitude testing strategy Unpublished DPhil disserta-tion University of South Africa South Africa

294 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

significantly lower (F(1 91)=2833 p=00) than thoseof WhitesIndiansColoreds (M=5559 SD=355)Although posttest scores of the experimental group(M=5165 SD=553) were higher than those of thecontrol group (M=508 SD = 665) differencesbetween both groups were nonsignificant (F(1 91)=085 p=36)

The correlation between scores before and after thetraining was 84 (p=00) for the experimental group and90 (p=00) for the control group showing that thetraining had only limited effect on the rank order ofindividuals scores This means that the test strongly butnot perfectly measures the same constructs on bothoccasions

162 Correlation between score gains and g loadedness

We estimated effect sizes for each of the four groups(race by condition) by computing the differencebetween mean pretest scores and posttest scores dividedby the standard deviation of the pretest scores of Blackand WhiteIndianColored students respectivelyFinally we calculated the correlations between effectsizes and the g loadings taken from Lynn et alCorrelations were minus 24 (p=10) for the Black experi-mental group minus 21 (p=20) for the WhiteIndianColored experimental group minus 08 (p=59) for theBlack control group and minus 41 (p=01) for the WhiteIndianColored control group Small sample sizesusually attenuate correlations (Hunter amp Schmidt1990) Collapsing the groups indeed resulted in higheraverage correlations minus 39 for the complete experimen-tal group and minus 26 for the complete control group

163 g loadings

Using the combined experimental and controlgroup a principle axis factor analysis on the pretestand posttest scores respectively resulted in a firstunrotated factor explaining 22 of the variance in thepretest scores and 18 of the variance in the posttestscores These findings suggest that the g loadedness ofthe RSPM decreased substantially after MediatedLearning Experience

164 Correlation between score gains and sum score

Correlating score gains with RSPM total scoresresulted in values of minus 60 (p=00) for the Blackexperimental group minus 18 (p=38) for the Black controlgroup minus 82 (p= 00) for the WhiteIndianColoredexperimental group and minus 48 (p=08) for the White

IndianColored control group After the use of thecorrection formula of Tucker et al (1966) thesecorrelations became minus 39 minus 08 minus 61 and minus 35respectively Overall these correlations show that low-g persons improved their scores more strongly thanhigh-g persons

17 Discussion

Skuy et al (2002) hypothesized that the low-qualityeducation of Blacks in South Africa would lead to anunderestimate of their cognitive abilities by IQ testsGroups of Black and WhiteIndianColored studentstook the Ravens Progressive Matrices twice and inbetween received Feuersteins Mediated LearningExperience The test scores went up substantially in allgroups Evidence for an authentic change in the g factorrequires broad transfer or generalizability across a widevariety of cognitive performance However Skuy et alshow that the gains did not generalize to scores on another highly similar test and to external criteria andwere therefore hollow As the score gains were in somecases quite largendash14 IQ points for the Black experi-mental groupndashthe question becomes what is it thatimproved

The findings show that the correlations betweenscore gains and g loadedness of the items were minus 39 forthe complete experimental group and minus 26 for thecomplete control group However because the gloadings and gain scores are measured at the itemlevel their reliabilities are not high resulting insubstantial attenuation of the correlation between gand d Moreover RSPM does not measure g perfectlyJensen (1998a p 91) estimates its g loading at 83When we estimate the reliability of the g vector at 70and the reliability of the gain score vector at 50corrections for unreliability and deviation from perfectconstruct validity of g only would result in estimatedtrue correlations of respectively minus 80 and minus 53 Thesevalues should be taken as underestimates controllingfor additional artifacts will bring them closer to the verystrong negative correlation found in the meta-analysis

The findings suggest that after training the gloadedness of the test decreased substantially Wefound negative substantial correlations between gainscores and RSPM total scores Table 4 shows that thetotal score variance decreased after training which is inline with low-g subjects increasing more than high-gsubjects Since as a rule high-g individuals profit themost from trainingndashas is reflected in the ubiquitouspositive correlation between IQ scores and trainingperformance (Jensen 1980 Schmidt amp Hunter 1998)ndash

295J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

these findings could be interpreted as an indication thatFeuersteins Mediated Learning Experience is not g-loaded in contrast with regular trainings that are clearlyg-loaded Substantial negative correlations betweengain scores and RSPM total scores are no definite proofof this hypothesis but are in line with it Additionalsubstantiation of our hypothesis that the Feuersteintraining has no or little g loadedness is that Coyle (2006)showed that gain scores loaded virtually zero on the gfactor Moreover Skuy et al reported that the predictivevalidity of their measure did not increase when thesecond Raven score was used The fact that individualswith low-g gained more than those with high-g could beinterpreted as an indication that the Mediated LearningExperience was not g-loaded It should be notedhowever that Feuerstein most likely did not intend hisintervention to be g-loaded He was interested inincreasing the performance of low scorers on bothtests and external criteria

18 General discussion

IQ scores are by far the best general predictor ofsuccess in education job training and work Howeverthere are many ways in which these IQ scores can beincreased for instance by means of retesting orparticipating in a learning potential training programWhat conclusions can be drawn from such score gainsJensens (1998a) hypothesis that the effects of trainingon abilities can be summarized in terms of Carrollsthree-stratum hierarchical factor model was tested in ameta-analysis on testndashretest data using Dutch Britishand American test batteries and with learning potentialdata from South Africa using Ravens ProgressiveMatrices The meta-analysis convincingly shows thattestndashretest score gains are not g-loaded The findingsfrom the learning potential study are clearly in line withthis when the attenuation caused by unreliability andother artifacts is taken into account the correlationbetween g loadings of items and gains on items has avalue that is somewhat comparable to the one found inthe meta-analysis for test batteries The data suggest thatthe g loadedness of item scores decreases after theintervention training Te Nijenhuis et als (2001)finding that practice and coaching reduced the g-loadedness of their test scores strengthens the presentfindings using item scores The findings show that notthe high-g participants increase their scores the mostndashasis common in training situationsndashbut it is the low-gpersons showing the largest increases of their scoresThis suggests that the intervention training is not g-loaded

Our findings fit quite well with the hierarchical modelof intelligence The generalizability of test scores residespredominantly in the g component whereas the test-specific ability component and the narrow abilitycomponent are virtually non-generalizable This is forinstance evidenced by the earlier finding that addingverbal tests to a g score or numerical tests to a g scoreresulted in only a very small incremental validity (Ree ampEarles 1991 Ree et al 1994) Additionally Ericssonand Lehmann (1996) reported immense gains for amemory task focusing on one narrow ability but did notfind any improvement for comparable memory tasksfocusing on another narrow ability As the score gains arenot related to g the generalizable g componentdecreases and since it is not unlikely that the Feuersteintraining itself is not g-loaded it is easy to understand whythe score gains did not generalize to scores on thecognitively loaded Representational Stencil Design TestFor a similar reason the score gains did not generalize tog-loaded external criteria as the correlation of the RSPMscores with performance in the end-of-year psychologyexamination did not significantly improve after media-tion Reeve and Lam (2005) claimed that retesting doesnot change the nature of what is being tested but ourfindings suggest the opposite

19 Limitations of the studies

Our meta-analysis and our analysis of the SouthAfrican study are strongly based on the method ofcorrelated vectors (MCV) and recently it has been shownto have limitations Dolan and Lubke (2001) have shownthat when comparing groups substantial positive vectorcorrelations can still be obtained even when groups differnot only on g but also on factors uncorrelated with gAshton and Lee (2005) show that associations of avariable with non-g sources of variance can produce avector correlation of zero even when the variable isstrongly associated with g They suggest that the gloadings of a subtest are sensitive to the nature of the othersubtest in a battery so that a specific sample of subtestsmay cause a spurious correlation between the vectorsNotwithstanding these limitations studies using MCVcontinue to appear (see for instance Colom Haier ampJung in pressHartmannKruuseampNyborg in press Leeet al 2006) The outcomes of our meta-analysis of a largenumber of studies using the method of correlated vectorsmay make an interesting contribution to the discussion onthe limitations of the method of correlated vectors

A principle of meta-analysis is that the amount ofinformation contained in one individual study is quitemodest Therefore one should carry out an analysis of

296 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

all studies on one topic and correct for artifacts leadingto a strong increase of the amount of information Thefact that our meta-analytical value of r=minus106 isvirtually identical to the theoretically expected correla-tion between g and d of minus100 holds some promise thata psychometric meta-analysis of studies using MCV is apowerful way of reducing some of the limitations ofMCV An alternative methodological approach is tolimit oneself to the rare datasets enabling the use ofstructural equations modeling However from a meta-analytical point of view these studies yield only a quitemodest amount of information

Additional meta-analyses of studies employing MCVare necessary to establish the validity of the combinationof MCV and psychometric meta-analysis Most likelymany would agree that a high positive meta-analyticalcorrelation between measures of g and measures ofanother construct implies that g plays a major role andthat a meta-analytical correlation of minus100 implies that gplays no role However it is not clear what value of themeta-analytical correlation to expect from MCV when gplays only a modest role After the present meta-analysison a construct that clearly has an inverse relationshipwith g it would be informative to carry out meta-analyses of studies on variables that are strongly linkedto g and variables that are modestly linked to g Anexample of the latter would be secular score gainswhich according to Lynns (1990) nutrition theoryshould be modestly g-loaded

The sample sizes in the South African study are notlarge but still larger than those in many other studies oflearning potential where an Nasymp10 is not unusual Theresults of a reanalysis of the many existing studies ondynamic testing could lead to a meta-analysis with alarge combined N The mean posttest score was quitehigh so a ceiling effect may have taken place for theWhiteIndianColored group leading to an underestima-tion of the experimental score gain for this group

Instead of testing the hypothesis with a stronglyunidimensional test such as the RSPM it would be betterto use a multidimensional test Moreover a large samplesize would allow the use of more rigorous data-analyticaltechniques leading to more definitive results Howeverto the best of our knowledge datasets meeting theserequirements do not exist and the Skuy et al study isarguably the best South African learning potential study

20 Score gains as low-quality measuresof motivation

As criterion-related validity is strongly dependent ong te Nijenhuis et als finding of lowered g loadings

after training should result in lowered criterion-relatedvalidity However the empirical findings show theopposite virtually all testndashretest and test preparationstudies on cognitive tests and scholastic aptitude tests thatreported both criterion-related validities demonstratesmall to modest increases in criterion-related validity forthe second or third test score (see Allalouf amp Ben-Shakhar 1998 Bashi 1976 Coyle 2006 HausknechtTrevor amp Farr 2002 Jones 1986 Linn 1977 Olsen ampSchrader 1959 Ortar 1960 Powers 1985 Reeve ampLam 2005) In the carefully designed study by Allaloufand Ben-Shakhar (1998) of a university entrance test theexperimental group received an intensive 40-h testcoaching program while the control group did not Thecriterion-related validity for the retest increased for bothgroups Most importantly the increase was the samemdashitwas not larger for the experimental group

In a little-known but carefully designed large-scalelearning potential study by Resing (1990 see Table423) she compared an experimental group thatreceived a pretest a learning potential training and aposttest against a control group that received only thepretest and the posttest The mean criterion-relatedvalidity of the various second scores was 62 for both theexperimental and the control group Learning potentialtraining did not result in incremental criterion-relatedvalidity over and above the validity resulting fromsimply retesting The findings from both Resing andAllalouf and Ben-Shakhar suggest that cognitiveinterventions do not increase criterion-related validitymore than simple retesting

g and the personality measure conscientiousness havebeen shown to make an excellent combination ofpredictors (Schmidt amp Hunter 1998) Conscientiousnessrepresents among other characteristics persistence a willto achieve and the ability to focus effort on the goal Afield study on test preparation using actual job applicants(Clause Delbridge Schmitt Chan amp Jennings 2001)showed that motivation to perform well on the testcorrelated 25 with test performance One could speculatethat score increases do not reflect a true cognitivecomponent but rather become low-quality measures ofmotivation Further since the increase in validity due toretesting and learning potential training is modest incomparison to the large increase obtainable from the useof personality questionnaires personality testing mightprovide a less expensive and more accurate alternative

21 Effectiveness of various training formats

Components of the mediation training used by Skuyet al (2002) are similar to the test training used in te

297J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Nijenhuis et al (2001) Both the Dutch training and theSouth African training took 3 h but whereas in theDutch training the focus was on two different testformats the South African training dealt only with onetest format The test training by Lloyd and Pidgeon(1961) took even less time namely two half-hoursegments each focusing on one test format The effectsizes in all studies were roughly comparable Thissuggests that the methodologies employed by teNijenhuis et al and Lloyd and Pidgeon were moreefficient than those used by Skuy et al It is possible thatthe components of the mediation training that are notpresent in the other two training formats are not effectivein raising test scores and could therefore be left out Iftrue it might be possible to increase the scores on theRSPM by one SD with a relatively simple 1-h training

22 Generalizability of findings

Can these findings of hollow score gains after testndashretest test practice and Mediated Learning ExperienceTraining be generalized to other studies where training-induced score gains were found Ericsson and Lehmann(1996) reported tremendous score increases afterintensive training on numeric memory tests but thesegains did not generalize in the least to verbal memorytests Such gains on one narrow ability do not generalizeto another narrow ability clustering under the samebroad ability and are therefore hollow Similarly Jensen(1998b) showed that score gains due to adoption werenot on the g factor and were therefore most likelyhollow

Rushton (1999) argued that intergenerational scoregains are not linked to g suggesting the Flynn effectsmay be empty but he was strongly criticized by Flynn(1999 2000) In studies on the Flynn effect score gainsfound in cross-sectional studies are largest on the RSPM(Flynn 1987) It has been suggested by Lynn (1998) thata substantial part of these intergenerational score gainson the RSPM are generalizablendashthey do reflect highergndashbut the remaining part is hollow and should beinterpreted as schooling effects The RSPM does requirethe application of the mathematical principles ofaddition subtraction progression and the distributionof values In the three decades (1950sndash1980s) overwhich these increases in RSPM scores have occurredincreasing proportions of 15- to 18-year-olds haveremained in schools where they have learned mathskills that they have applied to the solution of matricesproblems Our findings could be interpreted as supportfor Lynns hypothesis of the partial hollowness of scoregains on the RSPM Notwithstanding the high g loading

of the sum score of the RSPM it is quite sensitive totestndashretest effects and training effects Some studies onthe Flynn effect (Lynn amp Hampson 1986 Teasdale ampOwen 1989) show that the increase in scores is largelyconcentrated in the lower segments of the IQ distribu-tion Our finding that low scorers show the largest gainsafter training may additionally support the notion that apart of the Flynn effect on the RSPM is hollow FinallyWicherts et als (2004) findings show that in some oftheir datasets the secular score gains are most stronglylinked to broad- narrow- and test-specific abilitiesshowing that an important part of the gains are non-generalizable

Ceci (1991) showed that increased schooling leads tohigher IQ scores but are these gains highly specific orpredominantly generalizable It would be interesting toapply the techniques we used in this study to thefindings from previous intervention studies It may bethat biological interventions (such as diet vitaminsupplements vaccination against infectious disease)rather than psychological or educational interventionsare the most cost-effective method of producing truechanges in g and broad abilities It may be that there is abiological barrier between the first stratum and thesecond stratum that restricts the effects of behavioralinterventions to narrow abilities and test specificities

Acknowledgement

We like to thank Mervyn Skuy for his permission touse his dataset

Thanks to Marieacute de Beer Raegan Murphy WelkoTomic Art Jensen and Frank Schmidt for feedback onprevious versions of this paper

Thanks to Arne Evers Wilma Resing (Dutch TestCommittee) and Andress Kooij (Harcourt) for alsohelping in locating testndashretest studies

References

Ackerman P L (1986) Individual differences in informationprocessing An investigation of intellectual abilities Intelligence10 101minus139

Ackerman P L (1987) Individual differences in skill learning Anintegration of psychometric and information processing skillsPsychological Bulletin 102 3minus27

Allalouf A amp Ben-Shakhar G (1998) The effect of coaching on thepredictive validity of scholastic aptitude tests Journal ofEducational Measurement 35(1) 31minus47

Ashton M C amp Lee K (2005) Problems with the method ofcorrelated vectors Intelligence 33 431minus444

Bashi Y (1976) Verbal and non-verbal abilities of 4th 6th and 8thgrade students in the Arab educational system in Israel JerusalemHebrew University School of Education

298 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Bleichrodt N Resing W C M Drenth P J D amp Zaal J N (1987)Intelligentie-meting bij kinderen Empirische en methodologischeverantwoording van de geReviseerde Amsterdamse Kinder Intelli-gentie Test [Measuring the intelligence of children Empirical andmethodological justification of the Revised Amsterdam ChildrenIntelligence Test] Lisse the Netherlands Swets

Bennett G K Seashore H G ampWesman A G (1974)DifferentialAptitude Tests (5th ed) Manual New York The PsychologicalCorporation

Boeyens J C A (1989) Learning potential An empiricalinvestigation Pretoria South Africa Human Science ResearchCouncil

Bosch F (1973) Inventarisatie beschrijving en onderzoek mbt dewijzigingen van de GATB incl test-hertest onderzoek (NoPz3bRp0120) [Stock-taking description and research concern-ing the modifications of the GATB includes testndashretest study]Utrecht the Netherlands Nederlandse Spoorwegen

Carroll J B (1993) Human cognitive abilities A survey of factoranalysis studies Cambridge University Press

Ceci S J (1991) How much does schooling influence generalintelligence and its cognitive components A reassessment of theevidence Developmental Psychology 27 703minus722

Christian K Bachnan H J amp Morrison F J (2001) Schooling andcognitive development In R J Sternberg amp E L Grigorenko(Eds) Environmental effects on cognitive abilities (pp 287minus335)Mahwah NJ Erlbaum

Clause C S Delbridge K Schmitt N Chan D amp Jennings D(2001) Test preparation activities and employment test perfor-mance Human Performance 14 149minus167

Cohen J (1988) Statistical power analysis for the behavioralsciences Hillsdale Lawrence Erlbaum

Colom R Jung R E amp Haier R J (in press) Finding the g-factor inbrain structure using the method of correlated vectors Intelligence

Covin T A (1977) Stability of the WISC-R for 9-year-olds withlearning difficulties Psychological Reports 40 1297minus1298

Coyle T R (2006) Testndashretest changes on scholastic aptitude tests arenot related to g Intelligence 34 15minus27

Cronbach L J (1990) Essentials of psychological testing New YorkHarperCollins

de Villiers AB (1999) Disadvantaged students academic perfor-mance Analysing the zone proximal developmentUnpublished DPhil thesis University of Cape Town South Africa

de Wolff C J amp Buiten B (1963) Een factoranalyse van viertestbatterijen [A factor analysis of four test batteries] NederlandsTijdschrift Voor Psychologie 18 220minus239

Dolan C V amp Lubke G (2001) Viewing Spearmans hypothesisfrom the perspective of multigroup PCA A comment onSchonemanns criticism Intelligence 29 231minus245

Drenth P J D Petrie J F amp Bleichrodt N (1968) Handleiding bijde Amsterdamse Kinder Intelligentie Test [Manual of theAmsterdam Children Intelligence Test] Amsterdam VrijeUniversiteit

Elliott C D (1983) British Ability Scales Manual 2 TechnicalHandbook Windsor Great-Britain NFER-Nelson

Engelbrecht M (1999) Leerpotensiaal as voorspeller van akademi-ese sukses van universiteitsstudente [Learning potential aspredictor of the academic success of university students]Unpublished D Phil thesis Potchefstroom University forChristian Higher Education South Africa

Ericsson K A amp Lehmann A C (1996) Expert and exceptionalperformance Evidence of maximal adaptation to task constraintsAnnual Review of Psychology 47 273minus305

Evers A amp Lucassen W (1991) Handleiding DAT 83 DifferentieumlleAanleg Testserie [Manual DAT83 Differential Aptitude Testseries] Amsterdam Swets

Fleishman E A amp Hempel W E (1955) The relation betweenabilities and improvement with practice in a visual discriminationreaction task Journal of Experimental Psychology 49 301minus312

Flynn J R (1987) Massive IQ gains in 14 nations What IQ testsreally measure Psychological Bulletin 101 171minus191

Flynn J R (1999) Evidence against Rushton The genetic loading ofWISC-R subtests and the causes of between-group IQ differencesPersonality and Individual Differences 26 373minus379

Flynn J R (2000) IQ gains WISC subtests and fluid g g theory andthe relevance of Spearmans hypothesis to race In G R B JGoode (Ed) The nature of intelligence (pp 202minus227) New YorkWiley

Gaydon VP (1988) Predictors of performance of disadvantagedadolescents on the SowetoAlexandra gifted child programmeUnpublished M Ed dissertation University of the WitwatersrandSouth Africa

Gottfredson L S (1997) Why g matters The complexity of everydaylife Intelligence 24(1) 79minus132

Gottfredson L S (2002) g Highly general and highly practical In RJ Sternberg amp E L Grigorenko (Eds) The general intelligencefactor How general is it (pp 331minus380) Mahwah NJ Erlbaum

Grigorenko E L amp Sternberg R J (1998) Dynamic testing Psy-chological Bulletin 124 75minus111

HaeckW Yeld N Conradie J Robertson N amp Shall A (1997) Adevelopmental approach to mathematics testing for universityadmissions and course placement Educational Studies in Mathe-matics 33 71minus91

Hartmann P Kruuse NHS amp Nyborg H (in press) Testing thecross-racial generality of Spearmans hypothesis in two samplesIntelligence

Hausknecht J P Trevor C O amp Farr J L (2002) Retaking abilitytests in a selection setting Implications for practice effects trainingperformance and turnover Journal of Applied Psychology 87(2)243minus254

Hunter J E amp Schmidt F L (1990) Methods of meta-analysisLondon Sage

Hunter J E amp Schmidt F L (2004) Methods of meta-analysis (2nded) London Sage

Jensen A R (1980) Bias in mental testing London MethuenJensen A R (1985) The nature of the blackndashwhite difference on

various psychometric tests Spearmans hypothesis Behavioraland Brain Sciences 8 193minus263

Jensen A R (1998a) The g factor The science of mental abilityLondon Praeger

Jensen A R (1998b) Adoption data and two g-related hypothesesIntelligence 25 1minus6

Johnson W Bouchard T J Krueger R F Jr McGue M ampGottesman I I (2004) Just one g Consistent results from threetest batteries Intelligence 32 95minus107

Johnson W te Nijenhuis J amp Bouchard TJ Jr (in press)Replication of the hierarchical visual-perceptual-image rotationmodel in de Wolff and Buitens (1963) battery of 46 tests of mentalability Intelligence

Jones R J (1986) A comparison of the predictive validity of theMCAT for coached and uncoached students Journal of MedicalEducation 61 335minus338

Kaufman A S amp Kaufman N L (1983) K-ABC KaufmanAssessment Battery for Children Interpretive manual CirclePines MN AGS

299J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Kooij A P Rolfhus E Wilkins C Yang Z amp Zhu J (2005)WAIS-III Nederlandstalige bewerking Technisch rapport hernor-mering [WAIS-III adoptation in Dutch Technical report renorm-ing] Amsterdam Harcourt

Kort W Schittekatte M Dekker P H Verhaeghe P Compaan EL Bosmans M amp Vermeir G (2005) WISC-IIINL WechslerIntelligence Scale for Children Derde Editie NL Handleiding enverantwoording [The Dutch WISC-III Wechsler Intelligence Scalefor Children Third Edition for the Netherlands Manual andjustification] Amsterdam NIP

Kulik J A Bangert-Drowns R L amp Kulik C C (1984)Effectiveness of coaching for aptitude tests PsychologicalBulletin 95 179minus188

Kulik J A Kulik C C amp Bangert R L (1984) Effects of practiceon aptitude and achievement test scores American EducationalResearch Journal 21 435minus447

Lee K H Choi Y Y Gray J R Cho S H Chae J -H Lee S etal (2006) Neural correlates of superior intelligence Strongerrecruitment of posterior parietal cortex NeuroImage 29(2)578minus586

Linn R L (1977) On the treatment of multiple scores for LawSchool Admission Test repeaters (Report LSAC-77-4) In LawSchool Admission Council Reports of LSAC Sponsored ResearchVolume III 1975-1977 Princeton NJ Law School AdmissionCouncil

Lipson LE (1992) Relationship of static and dynamic measures toscholastic achievement of black pupils Unpublished MEddissertation University of Witwatersrand South Africa

Lloyd F amp Pidgeon D A (1961) An investigation into the effects ofcoaching on non-verbal test material with European Indian andAfrican children British Journal of Educational Psychology 31145minus151

Luteijn F amp Barelds D P H (2005) GIT2 Groninger IntelligentieTest 2 [GIT2 Groningen Intelligence Test 2] Amsterdam Harcourt

Lynn R (1990) The role of nutrition in secular increases inintelligence Personality and Individual Differences 11 273minus285

Lynn R (1998) In support of the nutrition theory In U Neisser(Ed) The rising curve Long-term gains in IQ and relatedmeasures (pp 207minus215) Washington DC American Psycholo-gical Association

Lynn R Allik J amp Irwing P (2004) Sex differences on three factorsidentified in Ravens Standard Progressive Matrices Intelligence32 411minus424

Lynn R amp Hampson S (1986) The rise of national intelligenceEvidence from Britain Japan and the USA Personality andIndividual Differences 7 23minus32

Matarazzo J D Carmody T P amp Jacobs L D (1980) Testndashretestreliability and stability of the WAIS A literature review withimplications for clinical practice Journal of Clinical Neuropsy-chology 2(2) 89minus105

McCormick BK Dunlap WP Kennedy RS amp Jones MB(1983) The effects of practice on the Armed Forces VocationalAptitude Test Battery US Army Research Institute for theBehavioral and Social Sciences Technical Report 602

Mulder J L Dekker R amp Dekker P H (2004) KaufmanIntelligentietest voor adolesecenten en volwassenen (KAIT)Handleiding [Kaufman Intelligence test for adolescents and adults(KAIT) Manual] Leiden the Netherlands PITS

Murphy R (2002) A review of South African research in the fieldof dynamic assessment Unpublished MA dissertation Universityof Pretoria (available online from httpupetdupaczathesisavailableetd-05042002-161239)

Nel A (1997)Die voorspelling van akademiese sukses binne kontekstvan n alternatiewe universiteitstoelatingsbeleid [The predictionof academic success within the context of an alternative policy ofuniversity admission] Unpublished MA dissertation RandAfrikaans University South Africa

Neubauer A C amp Freudenthaler H H (1994) Reaction time in asentence-picture verification test and intelligence Individualstrategies and effects of extended practice Intelligence 19193minus218

Nunnally J C amp Bernstein I H (1994) Psychometric theory(3rd ed) New York McGraw-Hill

Olsen M amp Schrader W B (1959) The use of preliminary and finalScholastic Aptitude Test scores in predicting college grades(College Entrance Examination Board Research and DevelopmentReports and Statistical Reports 59-19 Princeton NJ Educa-tional Testing Service

Ortar G R (1960) Improving test validity by coaching EducationalResearch 2 137minus142

Powers D E (1985) Effects of test preparation on the validity ofGraduate Admission Test Applied Psychological Measurement 9179minus190

Raven J Raven J C amp Court J H (2000) Standard ProgressiveMatrices Raven manual Section 3 Oxford Psychologists Press

Ree M J amp Carretta T R (1994) The correlation of generalcognitive ability and psychomotor tracking tests InternationalJournal of Selection and Assessment 2 209minus216

Ree M J amp Earles A A (1991) Predicting training success Notmuch more than g Personnel Psychology 44 321minus332

Ree M J Earles J A amp Teachout M S (1994) Predicting jobperformance Not much more than g Journal of AppliedPsychology 79 518minus524

Reeve C L amp Lam H (2005) The psychometric paradox of practiceeffects due to retesting Measurement invariance and stable abilityestimates in the face of observed score changes Intelligence 33535minus549

Resing W C M (1990) Intelligentie en leerpotentieel Eenonderzoek naar het leerpotentieel van jonge leerlingen uit hetbasis-en speciaal onderwijs [Intelligence and learning potential Astudy into the learning potential of young students in basic andspecial education] Amsterdam the Netherlands Swets

Rushton J P (1999) Secular gains in IQ are not related to the g factorand inbreeding depressionmdashunlike blackndashwhite differences A replyto Flynn Personality and Individual Differences 26 381minus389

Rushton J P Skuy M amp Bons T A (2004) Construct validity ofRavens Advanced Progressive Matrices for African and non-African engineering students in South Africa InternationalJournal of Selection and Assessment 12(3) 220minus229

Schmidt F L amp Hunter J E (1998) The validity and utility ofselection methods in personnel psychology Practical and theore-tical implications of 85 years of research findings PsychologicalBulletin 124(2) 262minus274

Schmidt F L amp Hunter J E (1999) Theory testing andmeasurement error Intelligence 27(3) 183minus198

Schmidt F L amp Le H (2004) Software for the Hunter-Schmidtmeta-analysis methods University of Iowa Department ofManagement and Organization IOWA City IQ 42242

Schroots J J F amp van Alphen de Veer R J (1979) LDT LeidseDiagnostische Test Deel 1 Handleiding [LDT Leiden DiagnosticTest Part 1 Manual] Lisse the Netherlands Swets

Shochet I M (1986) Manifest and potential performance inadvantaged and disadvantaged students Unpublished DPhildissertation University of the Witwatersrand South Africa

300 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Skuy M Gewer A Osrin Y Khunou D Fridjon P amp RushtonJ P (2002) Effects of mediated learning experience on RavensMatrices scores of African and non-African university studentsin South Africa Intelligence 30 221minus232

Swanson H E amp Lussier C M (2001) A selective synthesis of theexperimental literature on dynamic assessment Review of Educa-tional Research 71 321minus363

Teasdale T W amp Owen D R (1989) Continuing secular increase inintelligence and a stable prevalence of high intelligence levelsIntelligence 13 255minus262

Tuma J M amp Appelbaum A S (1980) Reliability and practiceeffects of WISC-R IQ estimates in a normal population Educa-tional and Psychological Measurement 40 671minus678

te Nijenhuis J Tolboom E Resing W amp Bleichrodt N (2004)Does cultural background influence the intellectual performance ofchildren from immigrant groups Validity of the RAKITintelligence test for immigrant children European Journal ofPsychological Assessment 20 10minus26

te Nijenhuis J amp van der Flier H (1997) Comparability of GATBscores for immigrants and majority group members Some Dutchfindings Journal of Applied Psychology 82 675minus687

te Nijenhuis J Voskuijl O F amp Schijve N B (2001) Practice andcoaching on IQ tests Quite a lot of g International Journal ofSelection and Assessment 9 302minus308

Thorndike R L (1985) The central role of general ability inprediction Multivariate Behavioral Research 20 241minus254

Tucker L R Damarin F amp Messick S (1966) A base-free measureof change Psychometrika 31(4) 457minus473

van der Doef M P Kwint J M amp van der Koppel (1989) Wat lerenmoeilijk lerende kinderen van de WISC-R [What do children whohave difficulties in learning learn from the WISC-R] Kind enAdolescent 10 136minus141

United States Department of Labor (1970) Manual for the USTESGeneral Aptitude Test Battery Section III DevelopmentWashing-ton DC United States Department of Labor

van Geffen (1972) De betrouwbaarheid van de GATB 1002-B opbrugklasniveau [The reliability of the GATB 1002 B for the firstclass at secondary school] Catholic University Nijmegen theNetherlands Psychology of Work and Organisation

van Haasen P P de Bruyn E E J Pijl Y J Poortinga Y H LutjeSpelberg H C Vander Steene G et al (1986) WISC-RWechsler Intelligence Scale for Children-Revised Nederlandsta-lige uitgave [WISC-R Wechsler Intelligence Scale for Children-Revised Dutch edition] Lisse the Netherlands Swets

Wechsler D (1955) Manual for the Wechsler Adult IntelligenceScale New York The Psychological Corporation

Wechsler D (1967)Manual for the Wechsler Preschool and PrimaryScale of Intelligence New York The Psychological Corporation

Wechsler D (1974) Manual for the Wechsler Intelligence Scale forChildren-Revised New York The Psychological Corporation

Wechsler D (1981) WAIS-R manual Wechsler Adult IntelligenceScale-Revised New York The Psychological Corporation

Wechsler D (1997)WAIS-III Wechsler Adult Intelligence Scale-thirdedition and WMS-III Wechsler Memory Scale-third editionTechnical manual New York The Psychological Corporation

Wicherts J W Dolan C V Oosterveld P van Baal G C VBoomsma D I amp Span M M (2004) Are intelligence testsmeasurement invariant over time Investigating the nature of theFlynn effect Intelligence 32(5) 509minus537

Yeld N amp Haeck W (1997) Educational histories and academicpotential Can tests deliver Assessment and Evaluation in HigherEducation 22 5minus16

Zaaiman H (1998) Selecting students for Mathematics and ScienceThe challenge facing higher education in South Africa SouthAfrica Pretoria HSRC Publishers

Zaaiman H van der Flier H amp Thijs G D (2001) Dynamic testingin selection for an educational programme Assessing SouthAfrican performance on the Raven Progressive Matrices Inter-national Journal of Selection and Assessment 9 258minus269

Zolezzi S A (1992) Alternative selection measures for universityundergraduate admissions Unpublished MEd dissertation Uni-versity of the Witwatersrand South Africa

Zolezzi S A (1995) The effectiveness of dynamic assessment as analternative aptitude testing strategy Unpublished DPhil disserta-tion University of South Africa South Africa

295J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

these findings could be interpreted as an indication thatFeuersteins Mediated Learning Experience is not g-loaded in contrast with regular trainings that are clearlyg-loaded Substantial negative correlations betweengain scores and RSPM total scores are no definite proofof this hypothesis but are in line with it Additionalsubstantiation of our hypothesis that the Feuersteintraining has no or little g loadedness is that Coyle (2006)showed that gain scores loaded virtually zero on the gfactor Moreover Skuy et al reported that the predictivevalidity of their measure did not increase when thesecond Raven score was used The fact that individualswith low-g gained more than those with high-g could beinterpreted as an indication that the Mediated LearningExperience was not g-loaded It should be notedhowever that Feuerstein most likely did not intend hisintervention to be g-loaded He was interested inincreasing the performance of low scorers on bothtests and external criteria

18 General discussion

IQ scores are by far the best general predictor ofsuccess in education job training and work Howeverthere are many ways in which these IQ scores can beincreased for instance by means of retesting orparticipating in a learning potential training programWhat conclusions can be drawn from such score gainsJensens (1998a) hypothesis that the effects of trainingon abilities can be summarized in terms of Carrollsthree-stratum hierarchical factor model was tested in ameta-analysis on testndashretest data using Dutch Britishand American test batteries and with learning potentialdata from South Africa using Ravens ProgressiveMatrices The meta-analysis convincingly shows thattestndashretest score gains are not g-loaded The findingsfrom the learning potential study are clearly in line withthis when the attenuation caused by unreliability andother artifacts is taken into account the correlationbetween g loadings of items and gains on items has avalue that is somewhat comparable to the one found inthe meta-analysis for test batteries The data suggest thatthe g loadedness of item scores decreases after theintervention training Te Nijenhuis et als (2001)finding that practice and coaching reduced the g-loadedness of their test scores strengthens the presentfindings using item scores The findings show that notthe high-g participants increase their scores the mostndashasis common in training situationsndashbut it is the low-gpersons showing the largest increases of their scoresThis suggests that the intervention training is not g-loaded

Our findings fit quite well with the hierarchical modelof intelligence The generalizability of test scores residespredominantly in the g component whereas the test-specific ability component and the narrow abilitycomponent are virtually non-generalizable This is forinstance evidenced by the earlier finding that addingverbal tests to a g score or numerical tests to a g scoreresulted in only a very small incremental validity (Ree ampEarles 1991 Ree et al 1994) Additionally Ericssonand Lehmann (1996) reported immense gains for amemory task focusing on one narrow ability but did notfind any improvement for comparable memory tasksfocusing on another narrow ability As the score gains arenot related to g the generalizable g componentdecreases and since it is not unlikely that the Feuersteintraining itself is not g-loaded it is easy to understand whythe score gains did not generalize to scores on thecognitively loaded Representational Stencil Design TestFor a similar reason the score gains did not generalize tog-loaded external criteria as the correlation of the RSPMscores with performance in the end-of-year psychologyexamination did not significantly improve after media-tion Reeve and Lam (2005) claimed that retesting doesnot change the nature of what is being tested but ourfindings suggest the opposite

19 Limitations of the studies

Our meta-analysis and our analysis of the SouthAfrican study are strongly based on the method ofcorrelated vectors (MCV) and recently it has been shownto have limitations Dolan and Lubke (2001) have shownthat when comparing groups substantial positive vectorcorrelations can still be obtained even when groups differnot only on g but also on factors uncorrelated with gAshton and Lee (2005) show that associations of avariable with non-g sources of variance can produce avector correlation of zero even when the variable isstrongly associated with g They suggest that the gloadings of a subtest are sensitive to the nature of the othersubtest in a battery so that a specific sample of subtestsmay cause a spurious correlation between the vectorsNotwithstanding these limitations studies using MCVcontinue to appear (see for instance Colom Haier ampJung in pressHartmannKruuseampNyborg in press Leeet al 2006) The outcomes of our meta-analysis of a largenumber of studies using the method of correlated vectorsmay make an interesting contribution to the discussion onthe limitations of the method of correlated vectors

A principle of meta-analysis is that the amount ofinformation contained in one individual study is quitemodest Therefore one should carry out an analysis of

296 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

all studies on one topic and correct for artifacts leadingto a strong increase of the amount of information Thefact that our meta-analytical value of r=minus106 isvirtually identical to the theoretically expected correla-tion between g and d of minus100 holds some promise thata psychometric meta-analysis of studies using MCV is apowerful way of reducing some of the limitations ofMCV An alternative methodological approach is tolimit oneself to the rare datasets enabling the use ofstructural equations modeling However from a meta-analytical point of view these studies yield only a quitemodest amount of information

Additional meta-analyses of studies employing MCVare necessary to establish the validity of the combinationof MCV and psychometric meta-analysis Most likelymany would agree that a high positive meta-analyticalcorrelation between measures of g and measures ofanother construct implies that g plays a major role andthat a meta-analytical correlation of minus100 implies that gplays no role However it is not clear what value of themeta-analytical correlation to expect from MCV when gplays only a modest role After the present meta-analysison a construct that clearly has an inverse relationshipwith g it would be informative to carry out meta-analyses of studies on variables that are strongly linkedto g and variables that are modestly linked to g Anexample of the latter would be secular score gainswhich according to Lynns (1990) nutrition theoryshould be modestly g-loaded

The sample sizes in the South African study are notlarge but still larger than those in many other studies oflearning potential where an Nasymp10 is not unusual Theresults of a reanalysis of the many existing studies ondynamic testing could lead to a meta-analysis with alarge combined N The mean posttest score was quitehigh so a ceiling effect may have taken place for theWhiteIndianColored group leading to an underestima-tion of the experimental score gain for this group

Instead of testing the hypothesis with a stronglyunidimensional test such as the RSPM it would be betterto use a multidimensional test Moreover a large samplesize would allow the use of more rigorous data-analyticaltechniques leading to more definitive results Howeverto the best of our knowledge datasets meeting theserequirements do not exist and the Skuy et al study isarguably the best South African learning potential study

20 Score gains as low-quality measuresof motivation

As criterion-related validity is strongly dependent ong te Nijenhuis et als finding of lowered g loadings

after training should result in lowered criterion-relatedvalidity However the empirical findings show theopposite virtually all testndashretest and test preparationstudies on cognitive tests and scholastic aptitude tests thatreported both criterion-related validities demonstratesmall to modest increases in criterion-related validity forthe second or third test score (see Allalouf amp Ben-Shakhar 1998 Bashi 1976 Coyle 2006 HausknechtTrevor amp Farr 2002 Jones 1986 Linn 1977 Olsen ampSchrader 1959 Ortar 1960 Powers 1985 Reeve ampLam 2005) In the carefully designed study by Allaloufand Ben-Shakhar (1998) of a university entrance test theexperimental group received an intensive 40-h testcoaching program while the control group did not Thecriterion-related validity for the retest increased for bothgroups Most importantly the increase was the samemdashitwas not larger for the experimental group

In a little-known but carefully designed large-scalelearning potential study by Resing (1990 see Table423) she compared an experimental group thatreceived a pretest a learning potential training and aposttest against a control group that received only thepretest and the posttest The mean criterion-relatedvalidity of the various second scores was 62 for both theexperimental and the control group Learning potentialtraining did not result in incremental criterion-relatedvalidity over and above the validity resulting fromsimply retesting The findings from both Resing andAllalouf and Ben-Shakhar suggest that cognitiveinterventions do not increase criterion-related validitymore than simple retesting

g and the personality measure conscientiousness havebeen shown to make an excellent combination ofpredictors (Schmidt amp Hunter 1998) Conscientiousnessrepresents among other characteristics persistence a willto achieve and the ability to focus effort on the goal Afield study on test preparation using actual job applicants(Clause Delbridge Schmitt Chan amp Jennings 2001)showed that motivation to perform well on the testcorrelated 25 with test performance One could speculatethat score increases do not reflect a true cognitivecomponent but rather become low-quality measures ofmotivation Further since the increase in validity due toretesting and learning potential training is modest incomparison to the large increase obtainable from the useof personality questionnaires personality testing mightprovide a less expensive and more accurate alternative

21 Effectiveness of various training formats

Components of the mediation training used by Skuyet al (2002) are similar to the test training used in te

297J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Nijenhuis et al (2001) Both the Dutch training and theSouth African training took 3 h but whereas in theDutch training the focus was on two different testformats the South African training dealt only with onetest format The test training by Lloyd and Pidgeon(1961) took even less time namely two half-hoursegments each focusing on one test format The effectsizes in all studies were roughly comparable Thissuggests that the methodologies employed by teNijenhuis et al and Lloyd and Pidgeon were moreefficient than those used by Skuy et al It is possible thatthe components of the mediation training that are notpresent in the other two training formats are not effectivein raising test scores and could therefore be left out Iftrue it might be possible to increase the scores on theRSPM by one SD with a relatively simple 1-h training

22 Generalizability of findings

Can these findings of hollow score gains after testndashretest test practice and Mediated Learning ExperienceTraining be generalized to other studies where training-induced score gains were found Ericsson and Lehmann(1996) reported tremendous score increases afterintensive training on numeric memory tests but thesegains did not generalize in the least to verbal memorytests Such gains on one narrow ability do not generalizeto another narrow ability clustering under the samebroad ability and are therefore hollow Similarly Jensen(1998b) showed that score gains due to adoption werenot on the g factor and were therefore most likelyhollow

Rushton (1999) argued that intergenerational scoregains are not linked to g suggesting the Flynn effectsmay be empty but he was strongly criticized by Flynn(1999 2000) In studies on the Flynn effect score gainsfound in cross-sectional studies are largest on the RSPM(Flynn 1987) It has been suggested by Lynn (1998) thata substantial part of these intergenerational score gainson the RSPM are generalizablendashthey do reflect highergndashbut the remaining part is hollow and should beinterpreted as schooling effects The RSPM does requirethe application of the mathematical principles ofaddition subtraction progression and the distributionof values In the three decades (1950sndash1980s) overwhich these increases in RSPM scores have occurredincreasing proportions of 15- to 18-year-olds haveremained in schools where they have learned mathskills that they have applied to the solution of matricesproblems Our findings could be interpreted as supportfor Lynns hypothesis of the partial hollowness of scoregains on the RSPM Notwithstanding the high g loading

of the sum score of the RSPM it is quite sensitive totestndashretest effects and training effects Some studies onthe Flynn effect (Lynn amp Hampson 1986 Teasdale ampOwen 1989) show that the increase in scores is largelyconcentrated in the lower segments of the IQ distribu-tion Our finding that low scorers show the largest gainsafter training may additionally support the notion that apart of the Flynn effect on the RSPM is hollow FinallyWicherts et als (2004) findings show that in some oftheir datasets the secular score gains are most stronglylinked to broad- narrow- and test-specific abilitiesshowing that an important part of the gains are non-generalizable

Ceci (1991) showed that increased schooling leads tohigher IQ scores but are these gains highly specific orpredominantly generalizable It would be interesting toapply the techniques we used in this study to thefindings from previous intervention studies It may bethat biological interventions (such as diet vitaminsupplements vaccination against infectious disease)rather than psychological or educational interventionsare the most cost-effective method of producing truechanges in g and broad abilities It may be that there is abiological barrier between the first stratum and thesecond stratum that restricts the effects of behavioralinterventions to narrow abilities and test specificities

Acknowledgement

We like to thank Mervyn Skuy for his permission touse his dataset

Thanks to Marieacute de Beer Raegan Murphy WelkoTomic Art Jensen and Frank Schmidt for feedback onprevious versions of this paper

Thanks to Arne Evers Wilma Resing (Dutch TestCommittee) and Andress Kooij (Harcourt) for alsohelping in locating testndashretest studies

References

Ackerman P L (1986) Individual differences in informationprocessing An investigation of intellectual abilities Intelligence10 101minus139

Ackerman P L (1987) Individual differences in skill learning Anintegration of psychometric and information processing skillsPsychological Bulletin 102 3minus27

Allalouf A amp Ben-Shakhar G (1998) The effect of coaching on thepredictive validity of scholastic aptitude tests Journal ofEducational Measurement 35(1) 31minus47

Ashton M C amp Lee K (2005) Problems with the method ofcorrelated vectors Intelligence 33 431minus444

Bashi Y (1976) Verbal and non-verbal abilities of 4th 6th and 8thgrade students in the Arab educational system in Israel JerusalemHebrew University School of Education

298 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Bleichrodt N Resing W C M Drenth P J D amp Zaal J N (1987)Intelligentie-meting bij kinderen Empirische en methodologischeverantwoording van de geReviseerde Amsterdamse Kinder Intelli-gentie Test [Measuring the intelligence of children Empirical andmethodological justification of the Revised Amsterdam ChildrenIntelligence Test] Lisse the Netherlands Swets

Bennett G K Seashore H G ampWesman A G (1974)DifferentialAptitude Tests (5th ed) Manual New York The PsychologicalCorporation

Boeyens J C A (1989) Learning potential An empiricalinvestigation Pretoria South Africa Human Science ResearchCouncil

Bosch F (1973) Inventarisatie beschrijving en onderzoek mbt dewijzigingen van de GATB incl test-hertest onderzoek (NoPz3bRp0120) [Stock-taking description and research concern-ing the modifications of the GATB includes testndashretest study]Utrecht the Netherlands Nederlandse Spoorwegen

Carroll J B (1993) Human cognitive abilities A survey of factoranalysis studies Cambridge University Press

Ceci S J (1991) How much does schooling influence generalintelligence and its cognitive components A reassessment of theevidence Developmental Psychology 27 703minus722

Christian K Bachnan H J amp Morrison F J (2001) Schooling andcognitive development In R J Sternberg amp E L Grigorenko(Eds) Environmental effects on cognitive abilities (pp 287minus335)Mahwah NJ Erlbaum

Clause C S Delbridge K Schmitt N Chan D amp Jennings D(2001) Test preparation activities and employment test perfor-mance Human Performance 14 149minus167

Cohen J (1988) Statistical power analysis for the behavioralsciences Hillsdale Lawrence Erlbaum

Colom R Jung R E amp Haier R J (in press) Finding the g-factor inbrain structure using the method of correlated vectors Intelligence

Covin T A (1977) Stability of the WISC-R for 9-year-olds withlearning difficulties Psychological Reports 40 1297minus1298

Coyle T R (2006) Testndashretest changes on scholastic aptitude tests arenot related to g Intelligence 34 15minus27

Cronbach L J (1990) Essentials of psychological testing New YorkHarperCollins

de Villiers AB (1999) Disadvantaged students academic perfor-mance Analysing the zone proximal developmentUnpublished DPhil thesis University of Cape Town South Africa

de Wolff C J amp Buiten B (1963) Een factoranalyse van viertestbatterijen [A factor analysis of four test batteries] NederlandsTijdschrift Voor Psychologie 18 220minus239

Dolan C V amp Lubke G (2001) Viewing Spearmans hypothesisfrom the perspective of multigroup PCA A comment onSchonemanns criticism Intelligence 29 231minus245

Drenth P J D Petrie J F amp Bleichrodt N (1968) Handleiding bijde Amsterdamse Kinder Intelligentie Test [Manual of theAmsterdam Children Intelligence Test] Amsterdam VrijeUniversiteit

Elliott C D (1983) British Ability Scales Manual 2 TechnicalHandbook Windsor Great-Britain NFER-Nelson

Engelbrecht M (1999) Leerpotensiaal as voorspeller van akademi-ese sukses van universiteitsstudente [Learning potential aspredictor of the academic success of university students]Unpublished D Phil thesis Potchefstroom University forChristian Higher Education South Africa

Ericsson K A amp Lehmann A C (1996) Expert and exceptionalperformance Evidence of maximal adaptation to task constraintsAnnual Review of Psychology 47 273minus305

Evers A amp Lucassen W (1991) Handleiding DAT 83 DifferentieumlleAanleg Testserie [Manual DAT83 Differential Aptitude Testseries] Amsterdam Swets

Fleishman E A amp Hempel W E (1955) The relation betweenabilities and improvement with practice in a visual discriminationreaction task Journal of Experimental Psychology 49 301minus312

Flynn J R (1987) Massive IQ gains in 14 nations What IQ testsreally measure Psychological Bulletin 101 171minus191

Flynn J R (1999) Evidence against Rushton The genetic loading ofWISC-R subtests and the causes of between-group IQ differencesPersonality and Individual Differences 26 373minus379

Flynn J R (2000) IQ gains WISC subtests and fluid g g theory andthe relevance of Spearmans hypothesis to race In G R B JGoode (Ed) The nature of intelligence (pp 202minus227) New YorkWiley

Gaydon VP (1988) Predictors of performance of disadvantagedadolescents on the SowetoAlexandra gifted child programmeUnpublished M Ed dissertation University of the WitwatersrandSouth Africa

Gottfredson L S (1997) Why g matters The complexity of everydaylife Intelligence 24(1) 79minus132

Gottfredson L S (2002) g Highly general and highly practical In RJ Sternberg amp E L Grigorenko (Eds) The general intelligencefactor How general is it (pp 331minus380) Mahwah NJ Erlbaum

Grigorenko E L amp Sternberg R J (1998) Dynamic testing Psy-chological Bulletin 124 75minus111

HaeckW Yeld N Conradie J Robertson N amp Shall A (1997) Adevelopmental approach to mathematics testing for universityadmissions and course placement Educational Studies in Mathe-matics 33 71minus91

Hartmann P Kruuse NHS amp Nyborg H (in press) Testing thecross-racial generality of Spearmans hypothesis in two samplesIntelligence

Hausknecht J P Trevor C O amp Farr J L (2002) Retaking abilitytests in a selection setting Implications for practice effects trainingperformance and turnover Journal of Applied Psychology 87(2)243minus254

Hunter J E amp Schmidt F L (1990) Methods of meta-analysisLondon Sage

Hunter J E amp Schmidt F L (2004) Methods of meta-analysis (2nded) London Sage

Jensen A R (1980) Bias in mental testing London MethuenJensen A R (1985) The nature of the blackndashwhite difference on

various psychometric tests Spearmans hypothesis Behavioraland Brain Sciences 8 193minus263

Jensen A R (1998a) The g factor The science of mental abilityLondon Praeger

Jensen A R (1998b) Adoption data and two g-related hypothesesIntelligence 25 1minus6

Johnson W Bouchard T J Krueger R F Jr McGue M ampGottesman I I (2004) Just one g Consistent results from threetest batteries Intelligence 32 95minus107

Johnson W te Nijenhuis J amp Bouchard TJ Jr (in press)Replication of the hierarchical visual-perceptual-image rotationmodel in de Wolff and Buitens (1963) battery of 46 tests of mentalability Intelligence

Jones R J (1986) A comparison of the predictive validity of theMCAT for coached and uncoached students Journal of MedicalEducation 61 335minus338

Kaufman A S amp Kaufman N L (1983) K-ABC KaufmanAssessment Battery for Children Interpretive manual CirclePines MN AGS

299J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Kooij A P Rolfhus E Wilkins C Yang Z amp Zhu J (2005)WAIS-III Nederlandstalige bewerking Technisch rapport hernor-mering [WAIS-III adoptation in Dutch Technical report renorm-ing] Amsterdam Harcourt

Kort W Schittekatte M Dekker P H Verhaeghe P Compaan EL Bosmans M amp Vermeir G (2005) WISC-IIINL WechslerIntelligence Scale for Children Derde Editie NL Handleiding enverantwoording [The Dutch WISC-III Wechsler Intelligence Scalefor Children Third Edition for the Netherlands Manual andjustification] Amsterdam NIP

Kulik J A Bangert-Drowns R L amp Kulik C C (1984)Effectiveness of coaching for aptitude tests PsychologicalBulletin 95 179minus188

Kulik J A Kulik C C amp Bangert R L (1984) Effects of practiceon aptitude and achievement test scores American EducationalResearch Journal 21 435minus447

Lee K H Choi Y Y Gray J R Cho S H Chae J -H Lee S etal (2006) Neural correlates of superior intelligence Strongerrecruitment of posterior parietal cortex NeuroImage 29(2)578minus586

Linn R L (1977) On the treatment of multiple scores for LawSchool Admission Test repeaters (Report LSAC-77-4) In LawSchool Admission Council Reports of LSAC Sponsored ResearchVolume III 1975-1977 Princeton NJ Law School AdmissionCouncil

Lipson LE (1992) Relationship of static and dynamic measures toscholastic achievement of black pupils Unpublished MEddissertation University of Witwatersrand South Africa

Lloyd F amp Pidgeon D A (1961) An investigation into the effects ofcoaching on non-verbal test material with European Indian andAfrican children British Journal of Educational Psychology 31145minus151

Luteijn F amp Barelds D P H (2005) GIT2 Groninger IntelligentieTest 2 [GIT2 Groningen Intelligence Test 2] Amsterdam Harcourt

Lynn R (1990) The role of nutrition in secular increases inintelligence Personality and Individual Differences 11 273minus285

Lynn R (1998) In support of the nutrition theory In U Neisser(Ed) The rising curve Long-term gains in IQ and relatedmeasures (pp 207minus215) Washington DC American Psycholo-gical Association

Lynn R Allik J amp Irwing P (2004) Sex differences on three factorsidentified in Ravens Standard Progressive Matrices Intelligence32 411minus424

Lynn R amp Hampson S (1986) The rise of national intelligenceEvidence from Britain Japan and the USA Personality andIndividual Differences 7 23minus32

Matarazzo J D Carmody T P amp Jacobs L D (1980) Testndashretestreliability and stability of the WAIS A literature review withimplications for clinical practice Journal of Clinical Neuropsy-chology 2(2) 89minus105

McCormick BK Dunlap WP Kennedy RS amp Jones MB(1983) The effects of practice on the Armed Forces VocationalAptitude Test Battery US Army Research Institute for theBehavioral and Social Sciences Technical Report 602

Mulder J L Dekker R amp Dekker P H (2004) KaufmanIntelligentietest voor adolesecenten en volwassenen (KAIT)Handleiding [Kaufman Intelligence test for adolescents and adults(KAIT) Manual] Leiden the Netherlands PITS

Murphy R (2002) A review of South African research in the fieldof dynamic assessment Unpublished MA dissertation Universityof Pretoria (available online from httpupetdupaczathesisavailableetd-05042002-161239)

Nel A (1997)Die voorspelling van akademiese sukses binne kontekstvan n alternatiewe universiteitstoelatingsbeleid [The predictionof academic success within the context of an alternative policy ofuniversity admission] Unpublished MA dissertation RandAfrikaans University South Africa

Neubauer A C amp Freudenthaler H H (1994) Reaction time in asentence-picture verification test and intelligence Individualstrategies and effects of extended practice Intelligence 19193minus218

Nunnally J C amp Bernstein I H (1994) Psychometric theory(3rd ed) New York McGraw-Hill

Olsen M amp Schrader W B (1959) The use of preliminary and finalScholastic Aptitude Test scores in predicting college grades(College Entrance Examination Board Research and DevelopmentReports and Statistical Reports 59-19 Princeton NJ Educa-tional Testing Service

Ortar G R (1960) Improving test validity by coaching EducationalResearch 2 137minus142

Powers D E (1985) Effects of test preparation on the validity ofGraduate Admission Test Applied Psychological Measurement 9179minus190

Raven J Raven J C amp Court J H (2000) Standard ProgressiveMatrices Raven manual Section 3 Oxford Psychologists Press

Ree M J amp Carretta T R (1994) The correlation of generalcognitive ability and psychomotor tracking tests InternationalJournal of Selection and Assessment 2 209minus216

Ree M J amp Earles A A (1991) Predicting training success Notmuch more than g Personnel Psychology 44 321minus332

Ree M J Earles J A amp Teachout M S (1994) Predicting jobperformance Not much more than g Journal of AppliedPsychology 79 518minus524

Reeve C L amp Lam H (2005) The psychometric paradox of practiceeffects due to retesting Measurement invariance and stable abilityestimates in the face of observed score changes Intelligence 33535minus549

Resing W C M (1990) Intelligentie en leerpotentieel Eenonderzoek naar het leerpotentieel van jonge leerlingen uit hetbasis-en speciaal onderwijs [Intelligence and learning potential Astudy into the learning potential of young students in basic andspecial education] Amsterdam the Netherlands Swets

Rushton J P (1999) Secular gains in IQ are not related to the g factorand inbreeding depressionmdashunlike blackndashwhite differences A replyto Flynn Personality and Individual Differences 26 381minus389

Rushton J P Skuy M amp Bons T A (2004) Construct validity ofRavens Advanced Progressive Matrices for African and non-African engineering students in South Africa InternationalJournal of Selection and Assessment 12(3) 220minus229

Schmidt F L amp Hunter J E (1998) The validity and utility ofselection methods in personnel psychology Practical and theore-tical implications of 85 years of research findings PsychologicalBulletin 124(2) 262minus274

Schmidt F L amp Hunter J E (1999) Theory testing andmeasurement error Intelligence 27(3) 183minus198

Schmidt F L amp Le H (2004) Software for the Hunter-Schmidtmeta-analysis methods University of Iowa Department ofManagement and Organization IOWA City IQ 42242

Schroots J J F amp van Alphen de Veer R J (1979) LDT LeidseDiagnostische Test Deel 1 Handleiding [LDT Leiden DiagnosticTest Part 1 Manual] Lisse the Netherlands Swets

Shochet I M (1986) Manifest and potential performance inadvantaged and disadvantaged students Unpublished DPhildissertation University of the Witwatersrand South Africa

300 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Skuy M Gewer A Osrin Y Khunou D Fridjon P amp RushtonJ P (2002) Effects of mediated learning experience on RavensMatrices scores of African and non-African university studentsin South Africa Intelligence 30 221minus232

Swanson H E amp Lussier C M (2001) A selective synthesis of theexperimental literature on dynamic assessment Review of Educa-tional Research 71 321minus363

Teasdale T W amp Owen D R (1989) Continuing secular increase inintelligence and a stable prevalence of high intelligence levelsIntelligence 13 255minus262

Tuma J M amp Appelbaum A S (1980) Reliability and practiceeffects of WISC-R IQ estimates in a normal population Educa-tional and Psychological Measurement 40 671minus678

te Nijenhuis J Tolboom E Resing W amp Bleichrodt N (2004)Does cultural background influence the intellectual performance ofchildren from immigrant groups Validity of the RAKITintelligence test for immigrant children European Journal ofPsychological Assessment 20 10minus26

te Nijenhuis J amp van der Flier H (1997) Comparability of GATBscores for immigrants and majority group members Some Dutchfindings Journal of Applied Psychology 82 675minus687

te Nijenhuis J Voskuijl O F amp Schijve N B (2001) Practice andcoaching on IQ tests Quite a lot of g International Journal ofSelection and Assessment 9 302minus308

Thorndike R L (1985) The central role of general ability inprediction Multivariate Behavioral Research 20 241minus254

Tucker L R Damarin F amp Messick S (1966) A base-free measureof change Psychometrika 31(4) 457minus473

van der Doef M P Kwint J M amp van der Koppel (1989) Wat lerenmoeilijk lerende kinderen van de WISC-R [What do children whohave difficulties in learning learn from the WISC-R] Kind enAdolescent 10 136minus141

United States Department of Labor (1970) Manual for the USTESGeneral Aptitude Test Battery Section III DevelopmentWashing-ton DC United States Department of Labor

van Geffen (1972) De betrouwbaarheid van de GATB 1002-B opbrugklasniveau [The reliability of the GATB 1002 B for the firstclass at secondary school] Catholic University Nijmegen theNetherlands Psychology of Work and Organisation

van Haasen P P de Bruyn E E J Pijl Y J Poortinga Y H LutjeSpelberg H C Vander Steene G et al (1986) WISC-RWechsler Intelligence Scale for Children-Revised Nederlandsta-lige uitgave [WISC-R Wechsler Intelligence Scale for Children-Revised Dutch edition] Lisse the Netherlands Swets

Wechsler D (1955) Manual for the Wechsler Adult IntelligenceScale New York The Psychological Corporation

Wechsler D (1967)Manual for the Wechsler Preschool and PrimaryScale of Intelligence New York The Psychological Corporation

Wechsler D (1974) Manual for the Wechsler Intelligence Scale forChildren-Revised New York The Psychological Corporation

Wechsler D (1981) WAIS-R manual Wechsler Adult IntelligenceScale-Revised New York The Psychological Corporation

Wechsler D (1997)WAIS-III Wechsler Adult Intelligence Scale-thirdedition and WMS-III Wechsler Memory Scale-third editionTechnical manual New York The Psychological Corporation

Wicherts J W Dolan C V Oosterveld P van Baal G C VBoomsma D I amp Span M M (2004) Are intelligence testsmeasurement invariant over time Investigating the nature of theFlynn effect Intelligence 32(5) 509minus537

Yeld N amp Haeck W (1997) Educational histories and academicpotential Can tests deliver Assessment and Evaluation in HigherEducation 22 5minus16

Zaaiman H (1998) Selecting students for Mathematics and ScienceThe challenge facing higher education in South Africa SouthAfrica Pretoria HSRC Publishers

Zaaiman H van der Flier H amp Thijs G D (2001) Dynamic testingin selection for an educational programme Assessing SouthAfrican performance on the Raven Progressive Matrices Inter-national Journal of Selection and Assessment 9 258minus269

Zolezzi S A (1992) Alternative selection measures for universityundergraduate admissions Unpublished MEd dissertation Uni-versity of the Witwatersrand South Africa

Zolezzi S A (1995) The effectiveness of dynamic assessment as analternative aptitude testing strategy Unpublished DPhil disserta-tion University of South Africa South Africa

296 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

all studies on one topic and correct for artifacts leadingto a strong increase of the amount of information Thefact that our meta-analytical value of r=minus106 isvirtually identical to the theoretically expected correla-tion between g and d of minus100 holds some promise thata psychometric meta-analysis of studies using MCV is apowerful way of reducing some of the limitations ofMCV An alternative methodological approach is tolimit oneself to the rare datasets enabling the use ofstructural equations modeling However from a meta-analytical point of view these studies yield only a quitemodest amount of information

Additional meta-analyses of studies employing MCVare necessary to establish the validity of the combinationof MCV and psychometric meta-analysis Most likelymany would agree that a high positive meta-analyticalcorrelation between measures of g and measures ofanother construct implies that g plays a major role andthat a meta-analytical correlation of minus100 implies that gplays no role However it is not clear what value of themeta-analytical correlation to expect from MCV when gplays only a modest role After the present meta-analysison a construct that clearly has an inverse relationshipwith g it would be informative to carry out meta-analyses of studies on variables that are strongly linkedto g and variables that are modestly linked to g Anexample of the latter would be secular score gainswhich according to Lynns (1990) nutrition theoryshould be modestly g-loaded

The sample sizes in the South African study are notlarge but still larger than those in many other studies oflearning potential where an Nasymp10 is not unusual Theresults of a reanalysis of the many existing studies ondynamic testing could lead to a meta-analysis with alarge combined N The mean posttest score was quitehigh so a ceiling effect may have taken place for theWhiteIndianColored group leading to an underestima-tion of the experimental score gain for this group

Instead of testing the hypothesis with a stronglyunidimensional test such as the RSPM it would be betterto use a multidimensional test Moreover a large samplesize would allow the use of more rigorous data-analyticaltechniques leading to more definitive results Howeverto the best of our knowledge datasets meeting theserequirements do not exist and the Skuy et al study isarguably the best South African learning potential study

20 Score gains as low-quality measuresof motivation

As criterion-related validity is strongly dependent ong te Nijenhuis et als finding of lowered g loadings

after training should result in lowered criterion-relatedvalidity However the empirical findings show theopposite virtually all testndashretest and test preparationstudies on cognitive tests and scholastic aptitude tests thatreported both criterion-related validities demonstratesmall to modest increases in criterion-related validity forthe second or third test score (see Allalouf amp Ben-Shakhar 1998 Bashi 1976 Coyle 2006 HausknechtTrevor amp Farr 2002 Jones 1986 Linn 1977 Olsen ampSchrader 1959 Ortar 1960 Powers 1985 Reeve ampLam 2005) In the carefully designed study by Allaloufand Ben-Shakhar (1998) of a university entrance test theexperimental group received an intensive 40-h testcoaching program while the control group did not Thecriterion-related validity for the retest increased for bothgroups Most importantly the increase was the samemdashitwas not larger for the experimental group

In a little-known but carefully designed large-scalelearning potential study by Resing (1990 see Table423) she compared an experimental group thatreceived a pretest a learning potential training and aposttest against a control group that received only thepretest and the posttest The mean criterion-relatedvalidity of the various second scores was 62 for both theexperimental and the control group Learning potentialtraining did not result in incremental criterion-relatedvalidity over and above the validity resulting fromsimply retesting The findings from both Resing andAllalouf and Ben-Shakhar suggest that cognitiveinterventions do not increase criterion-related validitymore than simple retesting

g and the personality measure conscientiousness havebeen shown to make an excellent combination ofpredictors (Schmidt amp Hunter 1998) Conscientiousnessrepresents among other characteristics persistence a willto achieve and the ability to focus effort on the goal Afield study on test preparation using actual job applicants(Clause Delbridge Schmitt Chan amp Jennings 2001)showed that motivation to perform well on the testcorrelated 25 with test performance One could speculatethat score increases do not reflect a true cognitivecomponent but rather become low-quality measures ofmotivation Further since the increase in validity due toretesting and learning potential training is modest incomparison to the large increase obtainable from the useof personality questionnaires personality testing mightprovide a less expensive and more accurate alternative

21 Effectiveness of various training formats

Components of the mediation training used by Skuyet al (2002) are similar to the test training used in te

297J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Nijenhuis et al (2001) Both the Dutch training and theSouth African training took 3 h but whereas in theDutch training the focus was on two different testformats the South African training dealt only with onetest format The test training by Lloyd and Pidgeon(1961) took even less time namely two half-hoursegments each focusing on one test format The effectsizes in all studies were roughly comparable Thissuggests that the methodologies employed by teNijenhuis et al and Lloyd and Pidgeon were moreefficient than those used by Skuy et al It is possible thatthe components of the mediation training that are notpresent in the other two training formats are not effectivein raising test scores and could therefore be left out Iftrue it might be possible to increase the scores on theRSPM by one SD with a relatively simple 1-h training

22 Generalizability of findings

Can these findings of hollow score gains after testndashretest test practice and Mediated Learning ExperienceTraining be generalized to other studies where training-induced score gains were found Ericsson and Lehmann(1996) reported tremendous score increases afterintensive training on numeric memory tests but thesegains did not generalize in the least to verbal memorytests Such gains on one narrow ability do not generalizeto another narrow ability clustering under the samebroad ability and are therefore hollow Similarly Jensen(1998b) showed that score gains due to adoption werenot on the g factor and were therefore most likelyhollow

Rushton (1999) argued that intergenerational scoregains are not linked to g suggesting the Flynn effectsmay be empty but he was strongly criticized by Flynn(1999 2000) In studies on the Flynn effect score gainsfound in cross-sectional studies are largest on the RSPM(Flynn 1987) It has been suggested by Lynn (1998) thata substantial part of these intergenerational score gainson the RSPM are generalizablendashthey do reflect highergndashbut the remaining part is hollow and should beinterpreted as schooling effects The RSPM does requirethe application of the mathematical principles ofaddition subtraction progression and the distributionof values In the three decades (1950sndash1980s) overwhich these increases in RSPM scores have occurredincreasing proportions of 15- to 18-year-olds haveremained in schools where they have learned mathskills that they have applied to the solution of matricesproblems Our findings could be interpreted as supportfor Lynns hypothesis of the partial hollowness of scoregains on the RSPM Notwithstanding the high g loading

of the sum score of the RSPM it is quite sensitive totestndashretest effects and training effects Some studies onthe Flynn effect (Lynn amp Hampson 1986 Teasdale ampOwen 1989) show that the increase in scores is largelyconcentrated in the lower segments of the IQ distribu-tion Our finding that low scorers show the largest gainsafter training may additionally support the notion that apart of the Flynn effect on the RSPM is hollow FinallyWicherts et als (2004) findings show that in some oftheir datasets the secular score gains are most stronglylinked to broad- narrow- and test-specific abilitiesshowing that an important part of the gains are non-generalizable

Ceci (1991) showed that increased schooling leads tohigher IQ scores but are these gains highly specific orpredominantly generalizable It would be interesting toapply the techniques we used in this study to thefindings from previous intervention studies It may bethat biological interventions (such as diet vitaminsupplements vaccination against infectious disease)rather than psychological or educational interventionsare the most cost-effective method of producing truechanges in g and broad abilities It may be that there is abiological barrier between the first stratum and thesecond stratum that restricts the effects of behavioralinterventions to narrow abilities and test specificities

Acknowledgement

We like to thank Mervyn Skuy for his permission touse his dataset

Thanks to Marieacute de Beer Raegan Murphy WelkoTomic Art Jensen and Frank Schmidt for feedback onprevious versions of this paper

Thanks to Arne Evers Wilma Resing (Dutch TestCommittee) and Andress Kooij (Harcourt) for alsohelping in locating testndashretest studies

References

Ackerman P L (1986) Individual differences in informationprocessing An investigation of intellectual abilities Intelligence10 101minus139

Ackerman P L (1987) Individual differences in skill learning Anintegration of psychometric and information processing skillsPsychological Bulletin 102 3minus27

Allalouf A amp Ben-Shakhar G (1998) The effect of coaching on thepredictive validity of scholastic aptitude tests Journal ofEducational Measurement 35(1) 31minus47

Ashton M C amp Lee K (2005) Problems with the method ofcorrelated vectors Intelligence 33 431minus444

Bashi Y (1976) Verbal and non-verbal abilities of 4th 6th and 8thgrade students in the Arab educational system in Israel JerusalemHebrew University School of Education

298 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Bleichrodt N Resing W C M Drenth P J D amp Zaal J N (1987)Intelligentie-meting bij kinderen Empirische en methodologischeverantwoording van de geReviseerde Amsterdamse Kinder Intelli-gentie Test [Measuring the intelligence of children Empirical andmethodological justification of the Revised Amsterdam ChildrenIntelligence Test] Lisse the Netherlands Swets

Bennett G K Seashore H G ampWesman A G (1974)DifferentialAptitude Tests (5th ed) Manual New York The PsychologicalCorporation

Boeyens J C A (1989) Learning potential An empiricalinvestigation Pretoria South Africa Human Science ResearchCouncil

Bosch F (1973) Inventarisatie beschrijving en onderzoek mbt dewijzigingen van de GATB incl test-hertest onderzoek (NoPz3bRp0120) [Stock-taking description and research concern-ing the modifications of the GATB includes testndashretest study]Utrecht the Netherlands Nederlandse Spoorwegen

Carroll J B (1993) Human cognitive abilities A survey of factoranalysis studies Cambridge University Press

Ceci S J (1991) How much does schooling influence generalintelligence and its cognitive components A reassessment of theevidence Developmental Psychology 27 703minus722

Christian K Bachnan H J amp Morrison F J (2001) Schooling andcognitive development In R J Sternberg amp E L Grigorenko(Eds) Environmental effects on cognitive abilities (pp 287minus335)Mahwah NJ Erlbaum

Clause C S Delbridge K Schmitt N Chan D amp Jennings D(2001) Test preparation activities and employment test perfor-mance Human Performance 14 149minus167

Cohen J (1988) Statistical power analysis for the behavioralsciences Hillsdale Lawrence Erlbaum

Colom R Jung R E amp Haier R J (in press) Finding the g-factor inbrain structure using the method of correlated vectors Intelligence

Covin T A (1977) Stability of the WISC-R for 9-year-olds withlearning difficulties Psychological Reports 40 1297minus1298

Coyle T R (2006) Testndashretest changes on scholastic aptitude tests arenot related to g Intelligence 34 15minus27

Cronbach L J (1990) Essentials of psychological testing New YorkHarperCollins

de Villiers AB (1999) Disadvantaged students academic perfor-mance Analysing the zone proximal developmentUnpublished DPhil thesis University of Cape Town South Africa

de Wolff C J amp Buiten B (1963) Een factoranalyse van viertestbatterijen [A factor analysis of four test batteries] NederlandsTijdschrift Voor Psychologie 18 220minus239

Dolan C V amp Lubke G (2001) Viewing Spearmans hypothesisfrom the perspective of multigroup PCA A comment onSchonemanns criticism Intelligence 29 231minus245

Drenth P J D Petrie J F amp Bleichrodt N (1968) Handleiding bijde Amsterdamse Kinder Intelligentie Test [Manual of theAmsterdam Children Intelligence Test] Amsterdam VrijeUniversiteit

Elliott C D (1983) British Ability Scales Manual 2 TechnicalHandbook Windsor Great-Britain NFER-Nelson

Engelbrecht M (1999) Leerpotensiaal as voorspeller van akademi-ese sukses van universiteitsstudente [Learning potential aspredictor of the academic success of university students]Unpublished D Phil thesis Potchefstroom University forChristian Higher Education South Africa

Ericsson K A amp Lehmann A C (1996) Expert and exceptionalperformance Evidence of maximal adaptation to task constraintsAnnual Review of Psychology 47 273minus305

Evers A amp Lucassen W (1991) Handleiding DAT 83 DifferentieumlleAanleg Testserie [Manual DAT83 Differential Aptitude Testseries] Amsterdam Swets

Fleishman E A amp Hempel W E (1955) The relation betweenabilities and improvement with practice in a visual discriminationreaction task Journal of Experimental Psychology 49 301minus312

Flynn J R (1987) Massive IQ gains in 14 nations What IQ testsreally measure Psychological Bulletin 101 171minus191

Flynn J R (1999) Evidence against Rushton The genetic loading ofWISC-R subtests and the causes of between-group IQ differencesPersonality and Individual Differences 26 373minus379

Flynn J R (2000) IQ gains WISC subtests and fluid g g theory andthe relevance of Spearmans hypothesis to race In G R B JGoode (Ed) The nature of intelligence (pp 202minus227) New YorkWiley

Gaydon VP (1988) Predictors of performance of disadvantagedadolescents on the SowetoAlexandra gifted child programmeUnpublished M Ed dissertation University of the WitwatersrandSouth Africa

Gottfredson L S (1997) Why g matters The complexity of everydaylife Intelligence 24(1) 79minus132

Gottfredson L S (2002) g Highly general and highly practical In RJ Sternberg amp E L Grigorenko (Eds) The general intelligencefactor How general is it (pp 331minus380) Mahwah NJ Erlbaum

Grigorenko E L amp Sternberg R J (1998) Dynamic testing Psy-chological Bulletin 124 75minus111

HaeckW Yeld N Conradie J Robertson N amp Shall A (1997) Adevelopmental approach to mathematics testing for universityadmissions and course placement Educational Studies in Mathe-matics 33 71minus91

Hartmann P Kruuse NHS amp Nyborg H (in press) Testing thecross-racial generality of Spearmans hypothesis in two samplesIntelligence

Hausknecht J P Trevor C O amp Farr J L (2002) Retaking abilitytests in a selection setting Implications for practice effects trainingperformance and turnover Journal of Applied Psychology 87(2)243minus254

Hunter J E amp Schmidt F L (1990) Methods of meta-analysisLondon Sage

Hunter J E amp Schmidt F L (2004) Methods of meta-analysis (2nded) London Sage

Jensen A R (1980) Bias in mental testing London MethuenJensen A R (1985) The nature of the blackndashwhite difference on

various psychometric tests Spearmans hypothesis Behavioraland Brain Sciences 8 193minus263

Jensen A R (1998a) The g factor The science of mental abilityLondon Praeger

Jensen A R (1998b) Adoption data and two g-related hypothesesIntelligence 25 1minus6

Johnson W Bouchard T J Krueger R F Jr McGue M ampGottesman I I (2004) Just one g Consistent results from threetest batteries Intelligence 32 95minus107

Johnson W te Nijenhuis J amp Bouchard TJ Jr (in press)Replication of the hierarchical visual-perceptual-image rotationmodel in de Wolff and Buitens (1963) battery of 46 tests of mentalability Intelligence

Jones R J (1986) A comparison of the predictive validity of theMCAT for coached and uncoached students Journal of MedicalEducation 61 335minus338

Kaufman A S amp Kaufman N L (1983) K-ABC KaufmanAssessment Battery for Children Interpretive manual CirclePines MN AGS

299J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Kooij A P Rolfhus E Wilkins C Yang Z amp Zhu J (2005)WAIS-III Nederlandstalige bewerking Technisch rapport hernor-mering [WAIS-III adoptation in Dutch Technical report renorm-ing] Amsterdam Harcourt

Kort W Schittekatte M Dekker P H Verhaeghe P Compaan EL Bosmans M amp Vermeir G (2005) WISC-IIINL WechslerIntelligence Scale for Children Derde Editie NL Handleiding enverantwoording [The Dutch WISC-III Wechsler Intelligence Scalefor Children Third Edition for the Netherlands Manual andjustification] Amsterdam NIP

Kulik J A Bangert-Drowns R L amp Kulik C C (1984)Effectiveness of coaching for aptitude tests PsychologicalBulletin 95 179minus188

Kulik J A Kulik C C amp Bangert R L (1984) Effects of practiceon aptitude and achievement test scores American EducationalResearch Journal 21 435minus447

Lee K H Choi Y Y Gray J R Cho S H Chae J -H Lee S etal (2006) Neural correlates of superior intelligence Strongerrecruitment of posterior parietal cortex NeuroImage 29(2)578minus586

Linn R L (1977) On the treatment of multiple scores for LawSchool Admission Test repeaters (Report LSAC-77-4) In LawSchool Admission Council Reports of LSAC Sponsored ResearchVolume III 1975-1977 Princeton NJ Law School AdmissionCouncil

Lipson LE (1992) Relationship of static and dynamic measures toscholastic achievement of black pupils Unpublished MEddissertation University of Witwatersrand South Africa

Lloyd F amp Pidgeon D A (1961) An investigation into the effects ofcoaching on non-verbal test material with European Indian andAfrican children British Journal of Educational Psychology 31145minus151

Luteijn F amp Barelds D P H (2005) GIT2 Groninger IntelligentieTest 2 [GIT2 Groningen Intelligence Test 2] Amsterdam Harcourt

Lynn R (1990) The role of nutrition in secular increases inintelligence Personality and Individual Differences 11 273minus285

Lynn R (1998) In support of the nutrition theory In U Neisser(Ed) The rising curve Long-term gains in IQ and relatedmeasures (pp 207minus215) Washington DC American Psycholo-gical Association

Lynn R Allik J amp Irwing P (2004) Sex differences on three factorsidentified in Ravens Standard Progressive Matrices Intelligence32 411minus424

Lynn R amp Hampson S (1986) The rise of national intelligenceEvidence from Britain Japan and the USA Personality andIndividual Differences 7 23minus32

Matarazzo J D Carmody T P amp Jacobs L D (1980) Testndashretestreliability and stability of the WAIS A literature review withimplications for clinical practice Journal of Clinical Neuropsy-chology 2(2) 89minus105

McCormick BK Dunlap WP Kennedy RS amp Jones MB(1983) The effects of practice on the Armed Forces VocationalAptitude Test Battery US Army Research Institute for theBehavioral and Social Sciences Technical Report 602

Mulder J L Dekker R amp Dekker P H (2004) KaufmanIntelligentietest voor adolesecenten en volwassenen (KAIT)Handleiding [Kaufman Intelligence test for adolescents and adults(KAIT) Manual] Leiden the Netherlands PITS

Murphy R (2002) A review of South African research in the fieldof dynamic assessment Unpublished MA dissertation Universityof Pretoria (available online from httpupetdupaczathesisavailableetd-05042002-161239)

Nel A (1997)Die voorspelling van akademiese sukses binne kontekstvan n alternatiewe universiteitstoelatingsbeleid [The predictionof academic success within the context of an alternative policy ofuniversity admission] Unpublished MA dissertation RandAfrikaans University South Africa

Neubauer A C amp Freudenthaler H H (1994) Reaction time in asentence-picture verification test and intelligence Individualstrategies and effects of extended practice Intelligence 19193minus218

Nunnally J C amp Bernstein I H (1994) Psychometric theory(3rd ed) New York McGraw-Hill

Olsen M amp Schrader W B (1959) The use of preliminary and finalScholastic Aptitude Test scores in predicting college grades(College Entrance Examination Board Research and DevelopmentReports and Statistical Reports 59-19 Princeton NJ Educa-tional Testing Service

Ortar G R (1960) Improving test validity by coaching EducationalResearch 2 137minus142

Powers D E (1985) Effects of test preparation on the validity ofGraduate Admission Test Applied Psychological Measurement 9179minus190

Raven J Raven J C amp Court J H (2000) Standard ProgressiveMatrices Raven manual Section 3 Oxford Psychologists Press

Ree M J amp Carretta T R (1994) The correlation of generalcognitive ability and psychomotor tracking tests InternationalJournal of Selection and Assessment 2 209minus216

Ree M J amp Earles A A (1991) Predicting training success Notmuch more than g Personnel Psychology 44 321minus332

Ree M J Earles J A amp Teachout M S (1994) Predicting jobperformance Not much more than g Journal of AppliedPsychology 79 518minus524

Reeve C L amp Lam H (2005) The psychometric paradox of practiceeffects due to retesting Measurement invariance and stable abilityestimates in the face of observed score changes Intelligence 33535minus549

Resing W C M (1990) Intelligentie en leerpotentieel Eenonderzoek naar het leerpotentieel van jonge leerlingen uit hetbasis-en speciaal onderwijs [Intelligence and learning potential Astudy into the learning potential of young students in basic andspecial education] Amsterdam the Netherlands Swets

Rushton J P (1999) Secular gains in IQ are not related to the g factorand inbreeding depressionmdashunlike blackndashwhite differences A replyto Flynn Personality and Individual Differences 26 381minus389

Rushton J P Skuy M amp Bons T A (2004) Construct validity ofRavens Advanced Progressive Matrices for African and non-African engineering students in South Africa InternationalJournal of Selection and Assessment 12(3) 220minus229

Schmidt F L amp Hunter J E (1998) The validity and utility ofselection methods in personnel psychology Practical and theore-tical implications of 85 years of research findings PsychologicalBulletin 124(2) 262minus274

Schmidt F L amp Hunter J E (1999) Theory testing andmeasurement error Intelligence 27(3) 183minus198

Schmidt F L amp Le H (2004) Software for the Hunter-Schmidtmeta-analysis methods University of Iowa Department ofManagement and Organization IOWA City IQ 42242

Schroots J J F amp van Alphen de Veer R J (1979) LDT LeidseDiagnostische Test Deel 1 Handleiding [LDT Leiden DiagnosticTest Part 1 Manual] Lisse the Netherlands Swets

Shochet I M (1986) Manifest and potential performance inadvantaged and disadvantaged students Unpublished DPhildissertation University of the Witwatersrand South Africa

300 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Skuy M Gewer A Osrin Y Khunou D Fridjon P amp RushtonJ P (2002) Effects of mediated learning experience on RavensMatrices scores of African and non-African university studentsin South Africa Intelligence 30 221minus232

Swanson H E amp Lussier C M (2001) A selective synthesis of theexperimental literature on dynamic assessment Review of Educa-tional Research 71 321minus363

Teasdale T W amp Owen D R (1989) Continuing secular increase inintelligence and a stable prevalence of high intelligence levelsIntelligence 13 255minus262

Tuma J M amp Appelbaum A S (1980) Reliability and practiceeffects of WISC-R IQ estimates in a normal population Educa-tional and Psychological Measurement 40 671minus678

te Nijenhuis J Tolboom E Resing W amp Bleichrodt N (2004)Does cultural background influence the intellectual performance ofchildren from immigrant groups Validity of the RAKITintelligence test for immigrant children European Journal ofPsychological Assessment 20 10minus26

te Nijenhuis J amp van der Flier H (1997) Comparability of GATBscores for immigrants and majority group members Some Dutchfindings Journal of Applied Psychology 82 675minus687

te Nijenhuis J Voskuijl O F amp Schijve N B (2001) Practice andcoaching on IQ tests Quite a lot of g International Journal ofSelection and Assessment 9 302minus308

Thorndike R L (1985) The central role of general ability inprediction Multivariate Behavioral Research 20 241minus254

Tucker L R Damarin F amp Messick S (1966) A base-free measureof change Psychometrika 31(4) 457minus473

van der Doef M P Kwint J M amp van der Koppel (1989) Wat lerenmoeilijk lerende kinderen van de WISC-R [What do children whohave difficulties in learning learn from the WISC-R] Kind enAdolescent 10 136minus141

United States Department of Labor (1970) Manual for the USTESGeneral Aptitude Test Battery Section III DevelopmentWashing-ton DC United States Department of Labor

van Geffen (1972) De betrouwbaarheid van de GATB 1002-B opbrugklasniveau [The reliability of the GATB 1002 B for the firstclass at secondary school] Catholic University Nijmegen theNetherlands Psychology of Work and Organisation

van Haasen P P de Bruyn E E J Pijl Y J Poortinga Y H LutjeSpelberg H C Vander Steene G et al (1986) WISC-RWechsler Intelligence Scale for Children-Revised Nederlandsta-lige uitgave [WISC-R Wechsler Intelligence Scale for Children-Revised Dutch edition] Lisse the Netherlands Swets

Wechsler D (1955) Manual for the Wechsler Adult IntelligenceScale New York The Psychological Corporation

Wechsler D (1967)Manual for the Wechsler Preschool and PrimaryScale of Intelligence New York The Psychological Corporation

Wechsler D (1974) Manual for the Wechsler Intelligence Scale forChildren-Revised New York The Psychological Corporation

Wechsler D (1981) WAIS-R manual Wechsler Adult IntelligenceScale-Revised New York The Psychological Corporation

Wechsler D (1997)WAIS-III Wechsler Adult Intelligence Scale-thirdedition and WMS-III Wechsler Memory Scale-third editionTechnical manual New York The Psychological Corporation

Wicherts J W Dolan C V Oosterveld P van Baal G C VBoomsma D I amp Span M M (2004) Are intelligence testsmeasurement invariant over time Investigating the nature of theFlynn effect Intelligence 32(5) 509minus537

Yeld N amp Haeck W (1997) Educational histories and academicpotential Can tests deliver Assessment and Evaluation in HigherEducation 22 5minus16

Zaaiman H (1998) Selecting students for Mathematics and ScienceThe challenge facing higher education in South Africa SouthAfrica Pretoria HSRC Publishers

Zaaiman H van der Flier H amp Thijs G D (2001) Dynamic testingin selection for an educational programme Assessing SouthAfrican performance on the Raven Progressive Matrices Inter-national Journal of Selection and Assessment 9 258minus269

Zolezzi S A (1992) Alternative selection measures for universityundergraduate admissions Unpublished MEd dissertation Uni-versity of the Witwatersrand South Africa

Zolezzi S A (1995) The effectiveness of dynamic assessment as analternative aptitude testing strategy Unpublished DPhil disserta-tion University of South Africa South Africa

297J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Nijenhuis et al (2001) Both the Dutch training and theSouth African training took 3 h but whereas in theDutch training the focus was on two different testformats the South African training dealt only with onetest format The test training by Lloyd and Pidgeon(1961) took even less time namely two half-hoursegments each focusing on one test format The effectsizes in all studies were roughly comparable Thissuggests that the methodologies employed by teNijenhuis et al and Lloyd and Pidgeon were moreefficient than those used by Skuy et al It is possible thatthe components of the mediation training that are notpresent in the other two training formats are not effectivein raising test scores and could therefore be left out Iftrue it might be possible to increase the scores on theRSPM by one SD with a relatively simple 1-h training

22 Generalizability of findings

Can these findings of hollow score gains after testndashretest test practice and Mediated Learning ExperienceTraining be generalized to other studies where training-induced score gains were found Ericsson and Lehmann(1996) reported tremendous score increases afterintensive training on numeric memory tests but thesegains did not generalize in the least to verbal memorytests Such gains on one narrow ability do not generalizeto another narrow ability clustering under the samebroad ability and are therefore hollow Similarly Jensen(1998b) showed that score gains due to adoption werenot on the g factor and were therefore most likelyhollow

Rushton (1999) argued that intergenerational scoregains are not linked to g suggesting the Flynn effectsmay be empty but he was strongly criticized by Flynn(1999 2000) In studies on the Flynn effect score gainsfound in cross-sectional studies are largest on the RSPM(Flynn 1987) It has been suggested by Lynn (1998) thata substantial part of these intergenerational score gainson the RSPM are generalizablendashthey do reflect highergndashbut the remaining part is hollow and should beinterpreted as schooling effects The RSPM does requirethe application of the mathematical principles ofaddition subtraction progression and the distributionof values In the three decades (1950sndash1980s) overwhich these increases in RSPM scores have occurredincreasing proportions of 15- to 18-year-olds haveremained in schools where they have learned mathskills that they have applied to the solution of matricesproblems Our findings could be interpreted as supportfor Lynns hypothesis of the partial hollowness of scoregains on the RSPM Notwithstanding the high g loading

of the sum score of the RSPM it is quite sensitive totestndashretest effects and training effects Some studies onthe Flynn effect (Lynn amp Hampson 1986 Teasdale ampOwen 1989) show that the increase in scores is largelyconcentrated in the lower segments of the IQ distribu-tion Our finding that low scorers show the largest gainsafter training may additionally support the notion that apart of the Flynn effect on the RSPM is hollow FinallyWicherts et als (2004) findings show that in some oftheir datasets the secular score gains are most stronglylinked to broad- narrow- and test-specific abilitiesshowing that an important part of the gains are non-generalizable

Ceci (1991) showed that increased schooling leads tohigher IQ scores but are these gains highly specific orpredominantly generalizable It would be interesting toapply the techniques we used in this study to thefindings from previous intervention studies It may bethat biological interventions (such as diet vitaminsupplements vaccination against infectious disease)rather than psychological or educational interventionsare the most cost-effective method of producing truechanges in g and broad abilities It may be that there is abiological barrier between the first stratum and thesecond stratum that restricts the effects of behavioralinterventions to narrow abilities and test specificities

Acknowledgement

We like to thank Mervyn Skuy for his permission touse his dataset

Thanks to Marieacute de Beer Raegan Murphy WelkoTomic Art Jensen and Frank Schmidt for feedback onprevious versions of this paper

Thanks to Arne Evers Wilma Resing (Dutch TestCommittee) and Andress Kooij (Harcourt) for alsohelping in locating testndashretest studies

References

Ackerman P L (1986) Individual differences in informationprocessing An investigation of intellectual abilities Intelligence10 101minus139

Ackerman P L (1987) Individual differences in skill learning Anintegration of psychometric and information processing skillsPsychological Bulletin 102 3minus27

Allalouf A amp Ben-Shakhar G (1998) The effect of coaching on thepredictive validity of scholastic aptitude tests Journal ofEducational Measurement 35(1) 31minus47

Ashton M C amp Lee K (2005) Problems with the method ofcorrelated vectors Intelligence 33 431minus444

Bashi Y (1976) Verbal and non-verbal abilities of 4th 6th and 8thgrade students in the Arab educational system in Israel JerusalemHebrew University School of Education

298 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Bleichrodt N Resing W C M Drenth P J D amp Zaal J N (1987)Intelligentie-meting bij kinderen Empirische en methodologischeverantwoording van de geReviseerde Amsterdamse Kinder Intelli-gentie Test [Measuring the intelligence of children Empirical andmethodological justification of the Revised Amsterdam ChildrenIntelligence Test] Lisse the Netherlands Swets

Bennett G K Seashore H G ampWesman A G (1974)DifferentialAptitude Tests (5th ed) Manual New York The PsychologicalCorporation

Boeyens J C A (1989) Learning potential An empiricalinvestigation Pretoria South Africa Human Science ResearchCouncil

Bosch F (1973) Inventarisatie beschrijving en onderzoek mbt dewijzigingen van de GATB incl test-hertest onderzoek (NoPz3bRp0120) [Stock-taking description and research concern-ing the modifications of the GATB includes testndashretest study]Utrecht the Netherlands Nederlandse Spoorwegen

Carroll J B (1993) Human cognitive abilities A survey of factoranalysis studies Cambridge University Press

Ceci S J (1991) How much does schooling influence generalintelligence and its cognitive components A reassessment of theevidence Developmental Psychology 27 703minus722

Christian K Bachnan H J amp Morrison F J (2001) Schooling andcognitive development In R J Sternberg amp E L Grigorenko(Eds) Environmental effects on cognitive abilities (pp 287minus335)Mahwah NJ Erlbaum

Clause C S Delbridge K Schmitt N Chan D amp Jennings D(2001) Test preparation activities and employment test perfor-mance Human Performance 14 149minus167

Cohen J (1988) Statistical power analysis for the behavioralsciences Hillsdale Lawrence Erlbaum

Colom R Jung R E amp Haier R J (in press) Finding the g-factor inbrain structure using the method of correlated vectors Intelligence

Covin T A (1977) Stability of the WISC-R for 9-year-olds withlearning difficulties Psychological Reports 40 1297minus1298

Coyle T R (2006) Testndashretest changes on scholastic aptitude tests arenot related to g Intelligence 34 15minus27

Cronbach L J (1990) Essentials of psychological testing New YorkHarperCollins

de Villiers AB (1999) Disadvantaged students academic perfor-mance Analysing the zone proximal developmentUnpublished DPhil thesis University of Cape Town South Africa

de Wolff C J amp Buiten B (1963) Een factoranalyse van viertestbatterijen [A factor analysis of four test batteries] NederlandsTijdschrift Voor Psychologie 18 220minus239

Dolan C V amp Lubke G (2001) Viewing Spearmans hypothesisfrom the perspective of multigroup PCA A comment onSchonemanns criticism Intelligence 29 231minus245

Drenth P J D Petrie J F amp Bleichrodt N (1968) Handleiding bijde Amsterdamse Kinder Intelligentie Test [Manual of theAmsterdam Children Intelligence Test] Amsterdam VrijeUniversiteit

Elliott C D (1983) British Ability Scales Manual 2 TechnicalHandbook Windsor Great-Britain NFER-Nelson

Engelbrecht M (1999) Leerpotensiaal as voorspeller van akademi-ese sukses van universiteitsstudente [Learning potential aspredictor of the academic success of university students]Unpublished D Phil thesis Potchefstroom University forChristian Higher Education South Africa

Ericsson K A amp Lehmann A C (1996) Expert and exceptionalperformance Evidence of maximal adaptation to task constraintsAnnual Review of Psychology 47 273minus305

Evers A amp Lucassen W (1991) Handleiding DAT 83 DifferentieumlleAanleg Testserie [Manual DAT83 Differential Aptitude Testseries] Amsterdam Swets

Fleishman E A amp Hempel W E (1955) The relation betweenabilities and improvement with practice in a visual discriminationreaction task Journal of Experimental Psychology 49 301minus312

Flynn J R (1987) Massive IQ gains in 14 nations What IQ testsreally measure Psychological Bulletin 101 171minus191

Flynn J R (1999) Evidence against Rushton The genetic loading ofWISC-R subtests and the causes of between-group IQ differencesPersonality and Individual Differences 26 373minus379

Flynn J R (2000) IQ gains WISC subtests and fluid g g theory andthe relevance of Spearmans hypothesis to race In G R B JGoode (Ed) The nature of intelligence (pp 202minus227) New YorkWiley

Gaydon VP (1988) Predictors of performance of disadvantagedadolescents on the SowetoAlexandra gifted child programmeUnpublished M Ed dissertation University of the WitwatersrandSouth Africa

Gottfredson L S (1997) Why g matters The complexity of everydaylife Intelligence 24(1) 79minus132

Gottfredson L S (2002) g Highly general and highly practical In RJ Sternberg amp E L Grigorenko (Eds) The general intelligencefactor How general is it (pp 331minus380) Mahwah NJ Erlbaum

Grigorenko E L amp Sternberg R J (1998) Dynamic testing Psy-chological Bulletin 124 75minus111

HaeckW Yeld N Conradie J Robertson N amp Shall A (1997) Adevelopmental approach to mathematics testing for universityadmissions and course placement Educational Studies in Mathe-matics 33 71minus91

Hartmann P Kruuse NHS amp Nyborg H (in press) Testing thecross-racial generality of Spearmans hypothesis in two samplesIntelligence

Hausknecht J P Trevor C O amp Farr J L (2002) Retaking abilitytests in a selection setting Implications for practice effects trainingperformance and turnover Journal of Applied Psychology 87(2)243minus254

Hunter J E amp Schmidt F L (1990) Methods of meta-analysisLondon Sage

Hunter J E amp Schmidt F L (2004) Methods of meta-analysis (2nded) London Sage

Jensen A R (1980) Bias in mental testing London MethuenJensen A R (1985) The nature of the blackndashwhite difference on

various psychometric tests Spearmans hypothesis Behavioraland Brain Sciences 8 193minus263

Jensen A R (1998a) The g factor The science of mental abilityLondon Praeger

Jensen A R (1998b) Adoption data and two g-related hypothesesIntelligence 25 1minus6

Johnson W Bouchard T J Krueger R F Jr McGue M ampGottesman I I (2004) Just one g Consistent results from threetest batteries Intelligence 32 95minus107

Johnson W te Nijenhuis J amp Bouchard TJ Jr (in press)Replication of the hierarchical visual-perceptual-image rotationmodel in de Wolff and Buitens (1963) battery of 46 tests of mentalability Intelligence

Jones R J (1986) A comparison of the predictive validity of theMCAT for coached and uncoached students Journal of MedicalEducation 61 335minus338

Kaufman A S amp Kaufman N L (1983) K-ABC KaufmanAssessment Battery for Children Interpretive manual CirclePines MN AGS

299J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Kooij A P Rolfhus E Wilkins C Yang Z amp Zhu J (2005)WAIS-III Nederlandstalige bewerking Technisch rapport hernor-mering [WAIS-III adoptation in Dutch Technical report renorm-ing] Amsterdam Harcourt

Kort W Schittekatte M Dekker P H Verhaeghe P Compaan EL Bosmans M amp Vermeir G (2005) WISC-IIINL WechslerIntelligence Scale for Children Derde Editie NL Handleiding enverantwoording [The Dutch WISC-III Wechsler Intelligence Scalefor Children Third Edition for the Netherlands Manual andjustification] Amsterdam NIP

Kulik J A Bangert-Drowns R L amp Kulik C C (1984)Effectiveness of coaching for aptitude tests PsychologicalBulletin 95 179minus188

Kulik J A Kulik C C amp Bangert R L (1984) Effects of practiceon aptitude and achievement test scores American EducationalResearch Journal 21 435minus447

Lee K H Choi Y Y Gray J R Cho S H Chae J -H Lee S etal (2006) Neural correlates of superior intelligence Strongerrecruitment of posterior parietal cortex NeuroImage 29(2)578minus586

Linn R L (1977) On the treatment of multiple scores for LawSchool Admission Test repeaters (Report LSAC-77-4) In LawSchool Admission Council Reports of LSAC Sponsored ResearchVolume III 1975-1977 Princeton NJ Law School AdmissionCouncil

Lipson LE (1992) Relationship of static and dynamic measures toscholastic achievement of black pupils Unpublished MEddissertation University of Witwatersrand South Africa

Lloyd F amp Pidgeon D A (1961) An investigation into the effects ofcoaching on non-verbal test material with European Indian andAfrican children British Journal of Educational Psychology 31145minus151

Luteijn F amp Barelds D P H (2005) GIT2 Groninger IntelligentieTest 2 [GIT2 Groningen Intelligence Test 2] Amsterdam Harcourt

Lynn R (1990) The role of nutrition in secular increases inintelligence Personality and Individual Differences 11 273minus285

Lynn R (1998) In support of the nutrition theory In U Neisser(Ed) The rising curve Long-term gains in IQ and relatedmeasures (pp 207minus215) Washington DC American Psycholo-gical Association

Lynn R Allik J amp Irwing P (2004) Sex differences on three factorsidentified in Ravens Standard Progressive Matrices Intelligence32 411minus424

Lynn R amp Hampson S (1986) The rise of national intelligenceEvidence from Britain Japan and the USA Personality andIndividual Differences 7 23minus32

Matarazzo J D Carmody T P amp Jacobs L D (1980) Testndashretestreliability and stability of the WAIS A literature review withimplications for clinical practice Journal of Clinical Neuropsy-chology 2(2) 89minus105

McCormick BK Dunlap WP Kennedy RS amp Jones MB(1983) The effects of practice on the Armed Forces VocationalAptitude Test Battery US Army Research Institute for theBehavioral and Social Sciences Technical Report 602

Mulder J L Dekker R amp Dekker P H (2004) KaufmanIntelligentietest voor adolesecenten en volwassenen (KAIT)Handleiding [Kaufman Intelligence test for adolescents and adults(KAIT) Manual] Leiden the Netherlands PITS

Murphy R (2002) A review of South African research in the fieldof dynamic assessment Unpublished MA dissertation Universityof Pretoria (available online from httpupetdupaczathesisavailableetd-05042002-161239)

Nel A (1997)Die voorspelling van akademiese sukses binne kontekstvan n alternatiewe universiteitstoelatingsbeleid [The predictionof academic success within the context of an alternative policy ofuniversity admission] Unpublished MA dissertation RandAfrikaans University South Africa

Neubauer A C amp Freudenthaler H H (1994) Reaction time in asentence-picture verification test and intelligence Individualstrategies and effects of extended practice Intelligence 19193minus218

Nunnally J C amp Bernstein I H (1994) Psychometric theory(3rd ed) New York McGraw-Hill

Olsen M amp Schrader W B (1959) The use of preliminary and finalScholastic Aptitude Test scores in predicting college grades(College Entrance Examination Board Research and DevelopmentReports and Statistical Reports 59-19 Princeton NJ Educa-tional Testing Service

Ortar G R (1960) Improving test validity by coaching EducationalResearch 2 137minus142

Powers D E (1985) Effects of test preparation on the validity ofGraduate Admission Test Applied Psychological Measurement 9179minus190

Raven J Raven J C amp Court J H (2000) Standard ProgressiveMatrices Raven manual Section 3 Oxford Psychologists Press

Ree M J amp Carretta T R (1994) The correlation of generalcognitive ability and psychomotor tracking tests InternationalJournal of Selection and Assessment 2 209minus216

Ree M J amp Earles A A (1991) Predicting training success Notmuch more than g Personnel Psychology 44 321minus332

Ree M J Earles J A amp Teachout M S (1994) Predicting jobperformance Not much more than g Journal of AppliedPsychology 79 518minus524

Reeve C L amp Lam H (2005) The psychometric paradox of practiceeffects due to retesting Measurement invariance and stable abilityestimates in the face of observed score changes Intelligence 33535minus549

Resing W C M (1990) Intelligentie en leerpotentieel Eenonderzoek naar het leerpotentieel van jonge leerlingen uit hetbasis-en speciaal onderwijs [Intelligence and learning potential Astudy into the learning potential of young students in basic andspecial education] Amsterdam the Netherlands Swets

Rushton J P (1999) Secular gains in IQ are not related to the g factorand inbreeding depressionmdashunlike blackndashwhite differences A replyto Flynn Personality and Individual Differences 26 381minus389

Rushton J P Skuy M amp Bons T A (2004) Construct validity ofRavens Advanced Progressive Matrices for African and non-African engineering students in South Africa InternationalJournal of Selection and Assessment 12(3) 220minus229

Schmidt F L amp Hunter J E (1998) The validity and utility ofselection methods in personnel psychology Practical and theore-tical implications of 85 years of research findings PsychologicalBulletin 124(2) 262minus274

Schmidt F L amp Hunter J E (1999) Theory testing andmeasurement error Intelligence 27(3) 183minus198

Schmidt F L amp Le H (2004) Software for the Hunter-Schmidtmeta-analysis methods University of Iowa Department ofManagement and Organization IOWA City IQ 42242

Schroots J J F amp van Alphen de Veer R J (1979) LDT LeidseDiagnostische Test Deel 1 Handleiding [LDT Leiden DiagnosticTest Part 1 Manual] Lisse the Netherlands Swets

Shochet I M (1986) Manifest and potential performance inadvantaged and disadvantaged students Unpublished DPhildissertation University of the Witwatersrand South Africa

300 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Skuy M Gewer A Osrin Y Khunou D Fridjon P amp RushtonJ P (2002) Effects of mediated learning experience on RavensMatrices scores of African and non-African university studentsin South Africa Intelligence 30 221minus232

Swanson H E amp Lussier C M (2001) A selective synthesis of theexperimental literature on dynamic assessment Review of Educa-tional Research 71 321minus363

Teasdale T W amp Owen D R (1989) Continuing secular increase inintelligence and a stable prevalence of high intelligence levelsIntelligence 13 255minus262

Tuma J M amp Appelbaum A S (1980) Reliability and practiceeffects of WISC-R IQ estimates in a normal population Educa-tional and Psychological Measurement 40 671minus678

te Nijenhuis J Tolboom E Resing W amp Bleichrodt N (2004)Does cultural background influence the intellectual performance ofchildren from immigrant groups Validity of the RAKITintelligence test for immigrant children European Journal ofPsychological Assessment 20 10minus26

te Nijenhuis J amp van der Flier H (1997) Comparability of GATBscores for immigrants and majority group members Some Dutchfindings Journal of Applied Psychology 82 675minus687

te Nijenhuis J Voskuijl O F amp Schijve N B (2001) Practice andcoaching on IQ tests Quite a lot of g International Journal ofSelection and Assessment 9 302minus308

Thorndike R L (1985) The central role of general ability inprediction Multivariate Behavioral Research 20 241minus254

Tucker L R Damarin F amp Messick S (1966) A base-free measureof change Psychometrika 31(4) 457minus473

van der Doef M P Kwint J M amp van der Koppel (1989) Wat lerenmoeilijk lerende kinderen van de WISC-R [What do children whohave difficulties in learning learn from the WISC-R] Kind enAdolescent 10 136minus141

United States Department of Labor (1970) Manual for the USTESGeneral Aptitude Test Battery Section III DevelopmentWashing-ton DC United States Department of Labor

van Geffen (1972) De betrouwbaarheid van de GATB 1002-B opbrugklasniveau [The reliability of the GATB 1002 B for the firstclass at secondary school] Catholic University Nijmegen theNetherlands Psychology of Work and Organisation

van Haasen P P de Bruyn E E J Pijl Y J Poortinga Y H LutjeSpelberg H C Vander Steene G et al (1986) WISC-RWechsler Intelligence Scale for Children-Revised Nederlandsta-lige uitgave [WISC-R Wechsler Intelligence Scale for Children-Revised Dutch edition] Lisse the Netherlands Swets

Wechsler D (1955) Manual for the Wechsler Adult IntelligenceScale New York The Psychological Corporation

Wechsler D (1967)Manual for the Wechsler Preschool and PrimaryScale of Intelligence New York The Psychological Corporation

Wechsler D (1974) Manual for the Wechsler Intelligence Scale forChildren-Revised New York The Psychological Corporation

Wechsler D (1981) WAIS-R manual Wechsler Adult IntelligenceScale-Revised New York The Psychological Corporation

Wechsler D (1997)WAIS-III Wechsler Adult Intelligence Scale-thirdedition and WMS-III Wechsler Memory Scale-third editionTechnical manual New York The Psychological Corporation

Wicherts J W Dolan C V Oosterveld P van Baal G C VBoomsma D I amp Span M M (2004) Are intelligence testsmeasurement invariant over time Investigating the nature of theFlynn effect Intelligence 32(5) 509minus537

Yeld N amp Haeck W (1997) Educational histories and academicpotential Can tests deliver Assessment and Evaluation in HigherEducation 22 5minus16

Zaaiman H (1998) Selecting students for Mathematics and ScienceThe challenge facing higher education in South Africa SouthAfrica Pretoria HSRC Publishers

Zaaiman H van der Flier H amp Thijs G D (2001) Dynamic testingin selection for an educational programme Assessing SouthAfrican performance on the Raven Progressive Matrices Inter-national Journal of Selection and Assessment 9 258minus269

Zolezzi S A (1992) Alternative selection measures for universityundergraduate admissions Unpublished MEd dissertation Uni-versity of the Witwatersrand South Africa

Zolezzi S A (1995) The effectiveness of dynamic assessment as analternative aptitude testing strategy Unpublished DPhil disserta-tion University of South Africa South Africa

298 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Bleichrodt N Resing W C M Drenth P J D amp Zaal J N (1987)Intelligentie-meting bij kinderen Empirische en methodologischeverantwoording van de geReviseerde Amsterdamse Kinder Intelli-gentie Test [Measuring the intelligence of children Empirical andmethodological justification of the Revised Amsterdam ChildrenIntelligence Test] Lisse the Netherlands Swets

Bennett G K Seashore H G ampWesman A G (1974)DifferentialAptitude Tests (5th ed) Manual New York The PsychologicalCorporation

Boeyens J C A (1989) Learning potential An empiricalinvestigation Pretoria South Africa Human Science ResearchCouncil

Bosch F (1973) Inventarisatie beschrijving en onderzoek mbt dewijzigingen van de GATB incl test-hertest onderzoek (NoPz3bRp0120) [Stock-taking description and research concern-ing the modifications of the GATB includes testndashretest study]Utrecht the Netherlands Nederlandse Spoorwegen

Carroll J B (1993) Human cognitive abilities A survey of factoranalysis studies Cambridge University Press

Ceci S J (1991) How much does schooling influence generalintelligence and its cognitive components A reassessment of theevidence Developmental Psychology 27 703minus722

Christian K Bachnan H J amp Morrison F J (2001) Schooling andcognitive development In R J Sternberg amp E L Grigorenko(Eds) Environmental effects on cognitive abilities (pp 287minus335)Mahwah NJ Erlbaum

Clause C S Delbridge K Schmitt N Chan D amp Jennings D(2001) Test preparation activities and employment test perfor-mance Human Performance 14 149minus167

Cohen J (1988) Statistical power analysis for the behavioralsciences Hillsdale Lawrence Erlbaum

Colom R Jung R E amp Haier R J (in press) Finding the g-factor inbrain structure using the method of correlated vectors Intelligence

Covin T A (1977) Stability of the WISC-R for 9-year-olds withlearning difficulties Psychological Reports 40 1297minus1298

Coyle T R (2006) Testndashretest changes on scholastic aptitude tests arenot related to g Intelligence 34 15minus27

Cronbach L J (1990) Essentials of psychological testing New YorkHarperCollins

de Villiers AB (1999) Disadvantaged students academic perfor-mance Analysing the zone proximal developmentUnpublished DPhil thesis University of Cape Town South Africa

de Wolff C J amp Buiten B (1963) Een factoranalyse van viertestbatterijen [A factor analysis of four test batteries] NederlandsTijdschrift Voor Psychologie 18 220minus239

Dolan C V amp Lubke G (2001) Viewing Spearmans hypothesisfrom the perspective of multigroup PCA A comment onSchonemanns criticism Intelligence 29 231minus245

Drenth P J D Petrie J F amp Bleichrodt N (1968) Handleiding bijde Amsterdamse Kinder Intelligentie Test [Manual of theAmsterdam Children Intelligence Test] Amsterdam VrijeUniversiteit

Elliott C D (1983) British Ability Scales Manual 2 TechnicalHandbook Windsor Great-Britain NFER-Nelson

Engelbrecht M (1999) Leerpotensiaal as voorspeller van akademi-ese sukses van universiteitsstudente [Learning potential aspredictor of the academic success of university students]Unpublished D Phil thesis Potchefstroom University forChristian Higher Education South Africa

Ericsson K A amp Lehmann A C (1996) Expert and exceptionalperformance Evidence of maximal adaptation to task constraintsAnnual Review of Psychology 47 273minus305

Evers A amp Lucassen W (1991) Handleiding DAT 83 DifferentieumlleAanleg Testserie [Manual DAT83 Differential Aptitude Testseries] Amsterdam Swets

Fleishman E A amp Hempel W E (1955) The relation betweenabilities and improvement with practice in a visual discriminationreaction task Journal of Experimental Psychology 49 301minus312

Flynn J R (1987) Massive IQ gains in 14 nations What IQ testsreally measure Psychological Bulletin 101 171minus191

Flynn J R (1999) Evidence against Rushton The genetic loading ofWISC-R subtests and the causes of between-group IQ differencesPersonality and Individual Differences 26 373minus379

Flynn J R (2000) IQ gains WISC subtests and fluid g g theory andthe relevance of Spearmans hypothesis to race In G R B JGoode (Ed) The nature of intelligence (pp 202minus227) New YorkWiley

Gaydon VP (1988) Predictors of performance of disadvantagedadolescents on the SowetoAlexandra gifted child programmeUnpublished M Ed dissertation University of the WitwatersrandSouth Africa

Gottfredson L S (1997) Why g matters The complexity of everydaylife Intelligence 24(1) 79minus132

Gottfredson L S (2002) g Highly general and highly practical In RJ Sternberg amp E L Grigorenko (Eds) The general intelligencefactor How general is it (pp 331minus380) Mahwah NJ Erlbaum

Grigorenko E L amp Sternberg R J (1998) Dynamic testing Psy-chological Bulletin 124 75minus111

HaeckW Yeld N Conradie J Robertson N amp Shall A (1997) Adevelopmental approach to mathematics testing for universityadmissions and course placement Educational Studies in Mathe-matics 33 71minus91

Hartmann P Kruuse NHS amp Nyborg H (in press) Testing thecross-racial generality of Spearmans hypothesis in two samplesIntelligence

Hausknecht J P Trevor C O amp Farr J L (2002) Retaking abilitytests in a selection setting Implications for practice effects trainingperformance and turnover Journal of Applied Psychology 87(2)243minus254

Hunter J E amp Schmidt F L (1990) Methods of meta-analysisLondon Sage

Hunter J E amp Schmidt F L (2004) Methods of meta-analysis (2nded) London Sage

Jensen A R (1980) Bias in mental testing London MethuenJensen A R (1985) The nature of the blackndashwhite difference on

various psychometric tests Spearmans hypothesis Behavioraland Brain Sciences 8 193minus263

Jensen A R (1998a) The g factor The science of mental abilityLondon Praeger

Jensen A R (1998b) Adoption data and two g-related hypothesesIntelligence 25 1minus6

Johnson W Bouchard T J Krueger R F Jr McGue M ampGottesman I I (2004) Just one g Consistent results from threetest batteries Intelligence 32 95minus107

Johnson W te Nijenhuis J amp Bouchard TJ Jr (in press)Replication of the hierarchical visual-perceptual-image rotationmodel in de Wolff and Buitens (1963) battery of 46 tests of mentalability Intelligence

Jones R J (1986) A comparison of the predictive validity of theMCAT for coached and uncoached students Journal of MedicalEducation 61 335minus338

Kaufman A S amp Kaufman N L (1983) K-ABC KaufmanAssessment Battery for Children Interpretive manual CirclePines MN AGS

299J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Kooij A P Rolfhus E Wilkins C Yang Z amp Zhu J (2005)WAIS-III Nederlandstalige bewerking Technisch rapport hernor-mering [WAIS-III adoptation in Dutch Technical report renorm-ing] Amsterdam Harcourt

Kort W Schittekatte M Dekker P H Verhaeghe P Compaan EL Bosmans M amp Vermeir G (2005) WISC-IIINL WechslerIntelligence Scale for Children Derde Editie NL Handleiding enverantwoording [The Dutch WISC-III Wechsler Intelligence Scalefor Children Third Edition for the Netherlands Manual andjustification] Amsterdam NIP

Kulik J A Bangert-Drowns R L amp Kulik C C (1984)Effectiveness of coaching for aptitude tests PsychologicalBulletin 95 179minus188

Kulik J A Kulik C C amp Bangert R L (1984) Effects of practiceon aptitude and achievement test scores American EducationalResearch Journal 21 435minus447

Lee K H Choi Y Y Gray J R Cho S H Chae J -H Lee S etal (2006) Neural correlates of superior intelligence Strongerrecruitment of posterior parietal cortex NeuroImage 29(2)578minus586

Linn R L (1977) On the treatment of multiple scores for LawSchool Admission Test repeaters (Report LSAC-77-4) In LawSchool Admission Council Reports of LSAC Sponsored ResearchVolume III 1975-1977 Princeton NJ Law School AdmissionCouncil

Lipson LE (1992) Relationship of static and dynamic measures toscholastic achievement of black pupils Unpublished MEddissertation University of Witwatersrand South Africa

Lloyd F amp Pidgeon D A (1961) An investigation into the effects ofcoaching on non-verbal test material with European Indian andAfrican children British Journal of Educational Psychology 31145minus151

Luteijn F amp Barelds D P H (2005) GIT2 Groninger IntelligentieTest 2 [GIT2 Groningen Intelligence Test 2] Amsterdam Harcourt

Lynn R (1990) The role of nutrition in secular increases inintelligence Personality and Individual Differences 11 273minus285

Lynn R (1998) In support of the nutrition theory In U Neisser(Ed) The rising curve Long-term gains in IQ and relatedmeasures (pp 207minus215) Washington DC American Psycholo-gical Association

Lynn R Allik J amp Irwing P (2004) Sex differences on three factorsidentified in Ravens Standard Progressive Matrices Intelligence32 411minus424

Lynn R amp Hampson S (1986) The rise of national intelligenceEvidence from Britain Japan and the USA Personality andIndividual Differences 7 23minus32

Matarazzo J D Carmody T P amp Jacobs L D (1980) Testndashretestreliability and stability of the WAIS A literature review withimplications for clinical practice Journal of Clinical Neuropsy-chology 2(2) 89minus105

McCormick BK Dunlap WP Kennedy RS amp Jones MB(1983) The effects of practice on the Armed Forces VocationalAptitude Test Battery US Army Research Institute for theBehavioral and Social Sciences Technical Report 602

Mulder J L Dekker R amp Dekker P H (2004) KaufmanIntelligentietest voor adolesecenten en volwassenen (KAIT)Handleiding [Kaufman Intelligence test for adolescents and adults(KAIT) Manual] Leiden the Netherlands PITS

Murphy R (2002) A review of South African research in the fieldof dynamic assessment Unpublished MA dissertation Universityof Pretoria (available online from httpupetdupaczathesisavailableetd-05042002-161239)

Nel A (1997)Die voorspelling van akademiese sukses binne kontekstvan n alternatiewe universiteitstoelatingsbeleid [The predictionof academic success within the context of an alternative policy ofuniversity admission] Unpublished MA dissertation RandAfrikaans University South Africa

Neubauer A C amp Freudenthaler H H (1994) Reaction time in asentence-picture verification test and intelligence Individualstrategies and effects of extended practice Intelligence 19193minus218

Nunnally J C amp Bernstein I H (1994) Psychometric theory(3rd ed) New York McGraw-Hill

Olsen M amp Schrader W B (1959) The use of preliminary and finalScholastic Aptitude Test scores in predicting college grades(College Entrance Examination Board Research and DevelopmentReports and Statistical Reports 59-19 Princeton NJ Educa-tional Testing Service

Ortar G R (1960) Improving test validity by coaching EducationalResearch 2 137minus142

Powers D E (1985) Effects of test preparation on the validity ofGraduate Admission Test Applied Psychological Measurement 9179minus190

Raven J Raven J C amp Court J H (2000) Standard ProgressiveMatrices Raven manual Section 3 Oxford Psychologists Press

Ree M J amp Carretta T R (1994) The correlation of generalcognitive ability and psychomotor tracking tests InternationalJournal of Selection and Assessment 2 209minus216

Ree M J amp Earles A A (1991) Predicting training success Notmuch more than g Personnel Psychology 44 321minus332

Ree M J Earles J A amp Teachout M S (1994) Predicting jobperformance Not much more than g Journal of AppliedPsychology 79 518minus524

Reeve C L amp Lam H (2005) The psychometric paradox of practiceeffects due to retesting Measurement invariance and stable abilityestimates in the face of observed score changes Intelligence 33535minus549

Resing W C M (1990) Intelligentie en leerpotentieel Eenonderzoek naar het leerpotentieel van jonge leerlingen uit hetbasis-en speciaal onderwijs [Intelligence and learning potential Astudy into the learning potential of young students in basic andspecial education] Amsterdam the Netherlands Swets

Rushton J P (1999) Secular gains in IQ are not related to the g factorand inbreeding depressionmdashunlike blackndashwhite differences A replyto Flynn Personality and Individual Differences 26 381minus389

Rushton J P Skuy M amp Bons T A (2004) Construct validity ofRavens Advanced Progressive Matrices for African and non-African engineering students in South Africa InternationalJournal of Selection and Assessment 12(3) 220minus229

Schmidt F L amp Hunter J E (1998) The validity and utility ofselection methods in personnel psychology Practical and theore-tical implications of 85 years of research findings PsychologicalBulletin 124(2) 262minus274

Schmidt F L amp Hunter J E (1999) Theory testing andmeasurement error Intelligence 27(3) 183minus198

Schmidt F L amp Le H (2004) Software for the Hunter-Schmidtmeta-analysis methods University of Iowa Department ofManagement and Organization IOWA City IQ 42242

Schroots J J F amp van Alphen de Veer R J (1979) LDT LeidseDiagnostische Test Deel 1 Handleiding [LDT Leiden DiagnosticTest Part 1 Manual] Lisse the Netherlands Swets

Shochet I M (1986) Manifest and potential performance inadvantaged and disadvantaged students Unpublished DPhildissertation University of the Witwatersrand South Africa

300 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Skuy M Gewer A Osrin Y Khunou D Fridjon P amp RushtonJ P (2002) Effects of mediated learning experience on RavensMatrices scores of African and non-African university studentsin South Africa Intelligence 30 221minus232

Swanson H E amp Lussier C M (2001) A selective synthesis of theexperimental literature on dynamic assessment Review of Educa-tional Research 71 321minus363

Teasdale T W amp Owen D R (1989) Continuing secular increase inintelligence and a stable prevalence of high intelligence levelsIntelligence 13 255minus262

Tuma J M amp Appelbaum A S (1980) Reliability and practiceeffects of WISC-R IQ estimates in a normal population Educa-tional and Psychological Measurement 40 671minus678

te Nijenhuis J Tolboom E Resing W amp Bleichrodt N (2004)Does cultural background influence the intellectual performance ofchildren from immigrant groups Validity of the RAKITintelligence test for immigrant children European Journal ofPsychological Assessment 20 10minus26

te Nijenhuis J amp van der Flier H (1997) Comparability of GATBscores for immigrants and majority group members Some Dutchfindings Journal of Applied Psychology 82 675minus687

te Nijenhuis J Voskuijl O F amp Schijve N B (2001) Practice andcoaching on IQ tests Quite a lot of g International Journal ofSelection and Assessment 9 302minus308

Thorndike R L (1985) The central role of general ability inprediction Multivariate Behavioral Research 20 241minus254

Tucker L R Damarin F amp Messick S (1966) A base-free measureof change Psychometrika 31(4) 457minus473

van der Doef M P Kwint J M amp van der Koppel (1989) Wat lerenmoeilijk lerende kinderen van de WISC-R [What do children whohave difficulties in learning learn from the WISC-R] Kind enAdolescent 10 136minus141

United States Department of Labor (1970) Manual for the USTESGeneral Aptitude Test Battery Section III DevelopmentWashing-ton DC United States Department of Labor

van Geffen (1972) De betrouwbaarheid van de GATB 1002-B opbrugklasniveau [The reliability of the GATB 1002 B for the firstclass at secondary school] Catholic University Nijmegen theNetherlands Psychology of Work and Organisation

van Haasen P P de Bruyn E E J Pijl Y J Poortinga Y H LutjeSpelberg H C Vander Steene G et al (1986) WISC-RWechsler Intelligence Scale for Children-Revised Nederlandsta-lige uitgave [WISC-R Wechsler Intelligence Scale for Children-Revised Dutch edition] Lisse the Netherlands Swets

Wechsler D (1955) Manual for the Wechsler Adult IntelligenceScale New York The Psychological Corporation

Wechsler D (1967)Manual for the Wechsler Preschool and PrimaryScale of Intelligence New York The Psychological Corporation

Wechsler D (1974) Manual for the Wechsler Intelligence Scale forChildren-Revised New York The Psychological Corporation

Wechsler D (1981) WAIS-R manual Wechsler Adult IntelligenceScale-Revised New York The Psychological Corporation

Wechsler D (1997)WAIS-III Wechsler Adult Intelligence Scale-thirdedition and WMS-III Wechsler Memory Scale-third editionTechnical manual New York The Psychological Corporation

Wicherts J W Dolan C V Oosterveld P van Baal G C VBoomsma D I amp Span M M (2004) Are intelligence testsmeasurement invariant over time Investigating the nature of theFlynn effect Intelligence 32(5) 509minus537

Yeld N amp Haeck W (1997) Educational histories and academicpotential Can tests deliver Assessment and Evaluation in HigherEducation 22 5minus16

Zaaiman H (1998) Selecting students for Mathematics and ScienceThe challenge facing higher education in South Africa SouthAfrica Pretoria HSRC Publishers

Zaaiman H van der Flier H amp Thijs G D (2001) Dynamic testingin selection for an educational programme Assessing SouthAfrican performance on the Raven Progressive Matrices Inter-national Journal of Selection and Assessment 9 258minus269

Zolezzi S A (1992) Alternative selection measures for universityundergraduate admissions Unpublished MEd dissertation Uni-versity of the Witwatersrand South Africa

Zolezzi S A (1995) The effectiveness of dynamic assessment as analternative aptitude testing strategy Unpublished DPhil disserta-tion University of South Africa South Africa

299J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Kooij A P Rolfhus E Wilkins C Yang Z amp Zhu J (2005)WAIS-III Nederlandstalige bewerking Technisch rapport hernor-mering [WAIS-III adoptation in Dutch Technical report renorm-ing] Amsterdam Harcourt

Kort W Schittekatte M Dekker P H Verhaeghe P Compaan EL Bosmans M amp Vermeir G (2005) WISC-IIINL WechslerIntelligence Scale for Children Derde Editie NL Handleiding enverantwoording [The Dutch WISC-III Wechsler Intelligence Scalefor Children Third Edition for the Netherlands Manual andjustification] Amsterdam NIP

Kulik J A Bangert-Drowns R L amp Kulik C C (1984)Effectiveness of coaching for aptitude tests PsychologicalBulletin 95 179minus188

Kulik J A Kulik C C amp Bangert R L (1984) Effects of practiceon aptitude and achievement test scores American EducationalResearch Journal 21 435minus447

Lee K H Choi Y Y Gray J R Cho S H Chae J -H Lee S etal (2006) Neural correlates of superior intelligence Strongerrecruitment of posterior parietal cortex NeuroImage 29(2)578minus586

Linn R L (1977) On the treatment of multiple scores for LawSchool Admission Test repeaters (Report LSAC-77-4) In LawSchool Admission Council Reports of LSAC Sponsored ResearchVolume III 1975-1977 Princeton NJ Law School AdmissionCouncil

Lipson LE (1992) Relationship of static and dynamic measures toscholastic achievement of black pupils Unpublished MEddissertation University of Witwatersrand South Africa

Lloyd F amp Pidgeon D A (1961) An investigation into the effects ofcoaching on non-verbal test material with European Indian andAfrican children British Journal of Educational Psychology 31145minus151

Luteijn F amp Barelds D P H (2005) GIT2 Groninger IntelligentieTest 2 [GIT2 Groningen Intelligence Test 2] Amsterdam Harcourt

Lynn R (1990) The role of nutrition in secular increases inintelligence Personality and Individual Differences 11 273minus285

Lynn R (1998) In support of the nutrition theory In U Neisser(Ed) The rising curve Long-term gains in IQ and relatedmeasures (pp 207minus215) Washington DC American Psycholo-gical Association

Lynn R Allik J amp Irwing P (2004) Sex differences on three factorsidentified in Ravens Standard Progressive Matrices Intelligence32 411minus424

Lynn R amp Hampson S (1986) The rise of national intelligenceEvidence from Britain Japan and the USA Personality andIndividual Differences 7 23minus32

Matarazzo J D Carmody T P amp Jacobs L D (1980) Testndashretestreliability and stability of the WAIS A literature review withimplications for clinical practice Journal of Clinical Neuropsy-chology 2(2) 89minus105

McCormick BK Dunlap WP Kennedy RS amp Jones MB(1983) The effects of practice on the Armed Forces VocationalAptitude Test Battery US Army Research Institute for theBehavioral and Social Sciences Technical Report 602

Mulder J L Dekker R amp Dekker P H (2004) KaufmanIntelligentietest voor adolesecenten en volwassenen (KAIT)Handleiding [Kaufman Intelligence test for adolescents and adults(KAIT) Manual] Leiden the Netherlands PITS

Murphy R (2002) A review of South African research in the fieldof dynamic assessment Unpublished MA dissertation Universityof Pretoria (available online from httpupetdupaczathesisavailableetd-05042002-161239)

Nel A (1997)Die voorspelling van akademiese sukses binne kontekstvan n alternatiewe universiteitstoelatingsbeleid [The predictionof academic success within the context of an alternative policy ofuniversity admission] Unpublished MA dissertation RandAfrikaans University South Africa

Neubauer A C amp Freudenthaler H H (1994) Reaction time in asentence-picture verification test and intelligence Individualstrategies and effects of extended practice Intelligence 19193minus218

Nunnally J C amp Bernstein I H (1994) Psychometric theory(3rd ed) New York McGraw-Hill

Olsen M amp Schrader W B (1959) The use of preliminary and finalScholastic Aptitude Test scores in predicting college grades(College Entrance Examination Board Research and DevelopmentReports and Statistical Reports 59-19 Princeton NJ Educa-tional Testing Service

Ortar G R (1960) Improving test validity by coaching EducationalResearch 2 137minus142

Powers D E (1985) Effects of test preparation on the validity ofGraduate Admission Test Applied Psychological Measurement 9179minus190

Raven J Raven J C amp Court J H (2000) Standard ProgressiveMatrices Raven manual Section 3 Oxford Psychologists Press

Ree M J amp Carretta T R (1994) The correlation of generalcognitive ability and psychomotor tracking tests InternationalJournal of Selection and Assessment 2 209minus216

Ree M J amp Earles A A (1991) Predicting training success Notmuch more than g Personnel Psychology 44 321minus332

Ree M J Earles J A amp Teachout M S (1994) Predicting jobperformance Not much more than g Journal of AppliedPsychology 79 518minus524

Reeve C L amp Lam H (2005) The psychometric paradox of practiceeffects due to retesting Measurement invariance and stable abilityestimates in the face of observed score changes Intelligence 33535minus549

Resing W C M (1990) Intelligentie en leerpotentieel Eenonderzoek naar het leerpotentieel van jonge leerlingen uit hetbasis-en speciaal onderwijs [Intelligence and learning potential Astudy into the learning potential of young students in basic andspecial education] Amsterdam the Netherlands Swets

Rushton J P (1999) Secular gains in IQ are not related to the g factorand inbreeding depressionmdashunlike blackndashwhite differences A replyto Flynn Personality and Individual Differences 26 381minus389

Rushton J P Skuy M amp Bons T A (2004) Construct validity ofRavens Advanced Progressive Matrices for African and non-African engineering students in South Africa InternationalJournal of Selection and Assessment 12(3) 220minus229

Schmidt F L amp Hunter J E (1998) The validity and utility ofselection methods in personnel psychology Practical and theore-tical implications of 85 years of research findings PsychologicalBulletin 124(2) 262minus274

Schmidt F L amp Hunter J E (1999) Theory testing andmeasurement error Intelligence 27(3) 183minus198

Schmidt F L amp Le H (2004) Software for the Hunter-Schmidtmeta-analysis methods University of Iowa Department ofManagement and Organization IOWA City IQ 42242

Schroots J J F amp van Alphen de Veer R J (1979) LDT LeidseDiagnostische Test Deel 1 Handleiding [LDT Leiden DiagnosticTest Part 1 Manual] Lisse the Netherlands Swets

Shochet I M (1986) Manifest and potential performance inadvantaged and disadvantaged students Unpublished DPhildissertation University of the Witwatersrand South Africa

300 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Skuy M Gewer A Osrin Y Khunou D Fridjon P amp RushtonJ P (2002) Effects of mediated learning experience on RavensMatrices scores of African and non-African university studentsin South Africa Intelligence 30 221minus232

Swanson H E amp Lussier C M (2001) A selective synthesis of theexperimental literature on dynamic assessment Review of Educa-tional Research 71 321minus363

Teasdale T W amp Owen D R (1989) Continuing secular increase inintelligence and a stable prevalence of high intelligence levelsIntelligence 13 255minus262

Tuma J M amp Appelbaum A S (1980) Reliability and practiceeffects of WISC-R IQ estimates in a normal population Educa-tional and Psychological Measurement 40 671minus678

te Nijenhuis J Tolboom E Resing W amp Bleichrodt N (2004)Does cultural background influence the intellectual performance ofchildren from immigrant groups Validity of the RAKITintelligence test for immigrant children European Journal ofPsychological Assessment 20 10minus26

te Nijenhuis J amp van der Flier H (1997) Comparability of GATBscores for immigrants and majority group members Some Dutchfindings Journal of Applied Psychology 82 675minus687

te Nijenhuis J Voskuijl O F amp Schijve N B (2001) Practice andcoaching on IQ tests Quite a lot of g International Journal ofSelection and Assessment 9 302minus308

Thorndike R L (1985) The central role of general ability inprediction Multivariate Behavioral Research 20 241minus254

Tucker L R Damarin F amp Messick S (1966) A base-free measureof change Psychometrika 31(4) 457minus473

van der Doef M P Kwint J M amp van der Koppel (1989) Wat lerenmoeilijk lerende kinderen van de WISC-R [What do children whohave difficulties in learning learn from the WISC-R] Kind enAdolescent 10 136minus141

United States Department of Labor (1970) Manual for the USTESGeneral Aptitude Test Battery Section III DevelopmentWashing-ton DC United States Department of Labor

van Geffen (1972) De betrouwbaarheid van de GATB 1002-B opbrugklasniveau [The reliability of the GATB 1002 B for the firstclass at secondary school] Catholic University Nijmegen theNetherlands Psychology of Work and Organisation

van Haasen P P de Bruyn E E J Pijl Y J Poortinga Y H LutjeSpelberg H C Vander Steene G et al (1986) WISC-RWechsler Intelligence Scale for Children-Revised Nederlandsta-lige uitgave [WISC-R Wechsler Intelligence Scale for Children-Revised Dutch edition] Lisse the Netherlands Swets

Wechsler D (1955) Manual for the Wechsler Adult IntelligenceScale New York The Psychological Corporation

Wechsler D (1967)Manual for the Wechsler Preschool and PrimaryScale of Intelligence New York The Psychological Corporation

Wechsler D (1974) Manual for the Wechsler Intelligence Scale forChildren-Revised New York The Psychological Corporation

Wechsler D (1981) WAIS-R manual Wechsler Adult IntelligenceScale-Revised New York The Psychological Corporation

Wechsler D (1997)WAIS-III Wechsler Adult Intelligence Scale-thirdedition and WMS-III Wechsler Memory Scale-third editionTechnical manual New York The Psychological Corporation

Wicherts J W Dolan C V Oosterveld P van Baal G C VBoomsma D I amp Span M M (2004) Are intelligence testsmeasurement invariant over time Investigating the nature of theFlynn effect Intelligence 32(5) 509minus537

Yeld N amp Haeck W (1997) Educational histories and academicpotential Can tests deliver Assessment and Evaluation in HigherEducation 22 5minus16

Zaaiman H (1998) Selecting students for Mathematics and ScienceThe challenge facing higher education in South Africa SouthAfrica Pretoria HSRC Publishers

Zaaiman H van der Flier H amp Thijs G D (2001) Dynamic testingin selection for an educational programme Assessing SouthAfrican performance on the Raven Progressive Matrices Inter-national Journal of Selection and Assessment 9 258minus269

Zolezzi S A (1992) Alternative selection measures for universityundergraduate admissions Unpublished MEd dissertation Uni-versity of the Witwatersrand South Africa

Zolezzi S A (1995) The effectiveness of dynamic assessment as analternative aptitude testing strategy Unpublished DPhil disserta-tion University of South Africa South Africa

300 J te Nijenhuis et al Intelligence 35 (2007) 283ndash300

Skuy M Gewer A Osrin Y Khunou D Fridjon P amp RushtonJ P (2002) Effects of mediated learning experience on RavensMatrices scores of African and non-African university studentsin South Africa Intelligence 30 221minus232

Swanson H E amp Lussier C M (2001) A selective synthesis of theexperimental literature on dynamic assessment Review of Educa-tional Research 71 321minus363

Teasdale T W amp Owen D R (1989) Continuing secular increase inintelligence and a stable prevalence of high intelligence levelsIntelligence 13 255minus262

Tuma J M amp Appelbaum A S (1980) Reliability and practiceeffects of WISC-R IQ estimates in a normal population Educa-tional and Psychological Measurement 40 671minus678

te Nijenhuis J Tolboom E Resing W amp Bleichrodt N (2004)Does cultural background influence the intellectual performance ofchildren from immigrant groups Validity of the RAKITintelligence test for immigrant children European Journal ofPsychological Assessment 20 10minus26

te Nijenhuis J amp van der Flier H (1997) Comparability of GATBscores for immigrants and majority group members Some Dutchfindings Journal of Applied Psychology 82 675minus687

te Nijenhuis J Voskuijl O F amp Schijve N B (2001) Practice andcoaching on IQ tests Quite a lot of g International Journal ofSelection and Assessment 9 302minus308

Thorndike R L (1985) The central role of general ability inprediction Multivariate Behavioral Research 20 241minus254

Tucker L R Damarin F amp Messick S (1966) A base-free measureof change Psychometrika 31(4) 457minus473

van der Doef M P Kwint J M amp van der Koppel (1989) Wat lerenmoeilijk lerende kinderen van de WISC-R [What do children whohave difficulties in learning learn from the WISC-R] Kind enAdolescent 10 136minus141

United States Department of Labor (1970) Manual for the USTESGeneral Aptitude Test Battery Section III DevelopmentWashing-ton DC United States Department of Labor

van Geffen (1972) De betrouwbaarheid van de GATB 1002-B opbrugklasniveau [The reliability of the GATB 1002 B for the firstclass at secondary school] Catholic University Nijmegen theNetherlands Psychology of Work and Organisation

van Haasen P P de Bruyn E E J Pijl Y J Poortinga Y H LutjeSpelberg H C Vander Steene G et al (1986) WISC-RWechsler Intelligence Scale for Children-Revised Nederlandsta-lige uitgave [WISC-R Wechsler Intelligence Scale for Children-Revised Dutch edition] Lisse the Netherlands Swets

Wechsler D (1955) Manual for the Wechsler Adult IntelligenceScale New York The Psychological Corporation

Wechsler D (1967)Manual for the Wechsler Preschool and PrimaryScale of Intelligence New York The Psychological Corporation

Wechsler D (1974) Manual for the Wechsler Intelligence Scale forChildren-Revised New York The Psychological Corporation

Wechsler D (1981) WAIS-R manual Wechsler Adult IntelligenceScale-Revised New York The Psychological Corporation

Wechsler D (1997)WAIS-III Wechsler Adult Intelligence Scale-thirdedition and WMS-III Wechsler Memory Scale-third editionTechnical manual New York The Psychological Corporation

Wicherts J W Dolan C V Oosterveld P van Baal G C VBoomsma D I amp Span M M (2004) Are intelligence testsmeasurement invariant over time Investigating the nature of theFlynn effect Intelligence 32(5) 509minus537

Yeld N amp Haeck W (1997) Educational histories and academicpotential Can tests deliver Assessment and Evaluation in HigherEducation 22 5minus16

Zaaiman H (1998) Selecting students for Mathematics and ScienceThe challenge facing higher education in South Africa SouthAfrica Pretoria HSRC Publishers

Zaaiman H van der Flier H amp Thijs G D (2001) Dynamic testingin selection for an educational programme Assessing SouthAfrican performance on the Raven Progressive Matrices Inter-national Journal of Selection and Assessment 9 258minus269

Zolezzi S A (1992) Alternative selection measures for universityundergraduate admissions Unpublished MEd dissertation Uni-versity of the Witwatersrand South Africa

Zolezzi S A (1995) The effectiveness of dynamic assessment as analternative aptitude testing strategy Unpublished DPhil disserta-tion University of South Africa South Africa


Recommended