+ All Categories
Home > Documents > Combining genetic, historical and geographical data to reconstruct the dynamics of bioinvasions:...

Combining genetic, historical and geographical data to reconstruct the dynamics of bioinvasions:...

Date post: 26-Apr-2023
Category:
Upload: cas-cz
View: 0 times
Download: 0 times
Share this document with a friend
16
INFERRING PROCESSES DURING INTRODUCTION AND RANGE EXPANSION Combining genetic, historical and geographical data to reconstruct the dynamics of bioinvasions: application to the cane toad Bufo marinus ARNAUD ESTOUP,* STUART J.E. BAIRD,*†NICOLAS RAY,‡§MATHIAS CURRAT,‡¶ JEAN-MARIE CORNUET,* FILIPE SANTOS,* MARK A. BEAUMONT** and LAURENT EXCOFFIER‡ *INRA, UMR CBGP (INRA / IRD / Cirad / Montpellier SupAgro), Campus International de Baillarguet, CS 30016, F-34988 Montferrier-sur-Lez Cedex, France, Centro de Investigac ¸a ˜o em Biodiversidade e Recursos Gene ´ticos (CIBIO / UP), Campus Agra ´rio de Vaira ˜o, 4485-661 Vaira ˜o, Portugal, Computational and Molecular Population Genetics Lab (CMPG), Zoological Institute, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland, §EnviroSPACE Laboratory, Climate Change and Climate Impacts, Institute for Environmental Sciences, University of Geneva, Battelle – Building D, 7 route de Drize, 1227 Carouge, Switzerland, Laboratory of Anthropology, Genetics and Peopling History (AGP), Department of Anthropology and Ecology, University of Geneva, 12 rue Gustave-Revilliod, CH-1227 Geneva, Switzerland, **School of Animal and Microbial Sciences, University of Reading, Whiteknights, Reading RG6 6AJ, UK Abstract We developed a spatially explicit model of a bioinvasion and used an approximate Bayesian computation (ABC) framework to make various inferences from a combination of genetic (microsatellite genotypes), historical (first observation dates) and geographical (spatial coordi- nates of introduction and sampled sites) information. Our method aims to discriminate between alternative introduction scenarios and to estimate posterior densities of demographi- cally relevant parameters of the invasive process. The performance of our landscape-ABC method is assessed using simulated data sets differing in their information content (genetic and / or historical data). We apply our methodology to the recent introduction and spatial expansion of the cane toad, Bufo marinus, in northern Australia. We find that, at least in the context of cane toad invasion, historical data are more informative than genetic data for dis- criminating between introduction scenarios. However, the combination of historical and genetic data provides the most accurate estimates of demographic parameters. For the cane toad, we find some evidence for a strong bottleneck prior to introduction, a small initial num- ber of founder individuals (about 15), a large population growth rate (about 400% per genera- tion), a standard deviation of dispersal distance of 19 km per generation and a high invasion speed at equilibrium (50 km per year). Our approach strengthens the application of the ABC method to the field of bioinvasion by allowing statistical inferences to be made on the intro- duction and the spatial expansion dynamics of invasive species using a combination of vari- ous relevant sources of information. Keywords: Bayesian inference, bioinvasion, demographic inferences, genetic data, historical data, spatial expansion Received 11 December 2009; revision received 24 February 2010, 11 March 2010; accepted 17 March 2010 Introduction Species invasions, commonly defined as the successful establishment and spread of species outside their native range, have frequently occurred throughout the earth’s history, yet their current extent and frequency are unprecedented (Sax et al. 2005). Such invasions can have detrimental consequences, including erosion of biodi- versity and disruption of invaded ecosystem function, public health risks and damages to agriculture and fish- eries (e.g. Lodge 1993; Ruiz et al. 2000; Pimentel et al. 2001). At the same time, species invasions represent Correspondence: Arnaud Estoup, Fax: 33 4 99 62 33 38; E-mail: [email protected] Ó 2010 Blackwell Publishing Ltd Molecular Ecology Resources (2010) 10, 886–901 doi: 10.1111/j.1755-0998.2010.02882.x
Transcript

INFERRING PROCESSES DURING INTRODUCTION AND RANGE EXPANSION

Combining genetic, historical and geographical datato reconstruct the dynamics of bioinvasions: applicationto the cane toad Bufomarinus

ARNAUD ESTOUP,* STUART J.E. BAIRD,*† NICOLAS RAY,‡§ MATHIAS CURRAT,‡¶JEAN-MARIE CORNUET,* FILIPE SANTOS,* MARK A. BEAUMONT** and LAURENT EXCOFFIER‡*INRA, UMR CBGP (INRA ⁄ IRD ⁄Cirad ⁄Montpellier SupAgro), Campus International de Baillarguet, CS 30016, F-34988Montferrier-sur-Lez Cedex, France, †Centro de Investigacao em Biodiversidade e Recursos Geneticos (CIBIO ⁄UP), CampusAgrario de Vairao, 4485-661 Vairao, Portugal, ‡Computational and Molecular Population Genetics Lab (CMPG), ZoologicalInstitute, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland, §EnviroSPACE Laboratory, Climate Change and ClimateImpacts, Institute for Environmental Sciences, University of Geneva, Battelle – Building D, 7 route de Drize, 1227 Carouge,Switzerland, ¶Laboratory of Anthropology, Genetics and Peopling History (AGP), Department of Anthropology and Ecology,University of Geneva, 12 rue Gustave-Revilliod, CH-1227 Geneva, Switzerland, **School of Animal and Microbial Sciences,University of Reading, Whiteknights, Reading RG6 6AJ, UK

Abstract

We developed a spatially explicit model of a bioinvasion and used an approximate Bayesiancomputation (ABC) framework to make various inferences from a combination of genetic(microsatellite genotypes), historical (first observation dates) and geographical (spatial coordi-nates of introduction and sampled sites) information. Our method aims to discriminatebetween alternative introduction scenarios and to estimate posterior densities of demographi-cally relevant parameters of the invasive process. The performance of our landscape-ABCmethod is assessed using simulated data sets differing in their information content (geneticand ⁄or historical data). We apply our methodology to the recent introduction and spatialexpansion of the cane toad, Bufo marinus, in northern Australia. We find that, at least in thecontext of cane toad invasion, historical data are more informative than genetic data for dis-criminating between introduction scenarios. However, the combination of historical andgenetic data provides the most accurate estimates of demographic parameters. For the canetoad, we find some evidence for a strong bottleneck prior to introduction, a small initial num-ber of founder individuals (about 15), a large population growth rate (about 400% per genera-tion), a standard deviation of dispersal distance of 19 km per generation and a high invasionspeed at equilibrium (50 km per year). Our approach strengthens the application of the ABCmethod to the field of bioinvasion by allowing statistical inferences to be made on the intro-duction and the spatial expansion dynamics of invasive species using a combination of vari-ous relevant sources of information.

Keywords: Bayesian inference, bioinvasion, demographic inferences, genetic data, historical data,spatial expansion

Received 11 December 2009; revision received 24 February 2010, 11 March 2010; accepted 17 March 2010

Introduction

Species invasions, commonly defined as the successfulestablishment and spread of species outside their native

range, have frequently occurred throughout the earth’shistory, yet their current extent and frequency areunprecedented (Sax et al. 2005). Such invasions can havedetrimental consequences, including erosion of biodi-versity and disruption of invaded ecosystem function,public health risks and damages to agriculture and fish-eries (e.g. Lodge 1993; Ruiz et al. 2000; Pimentel et al.2001). At the same time, species invasions represent

Correspondence: Arnaud Estoup, Fax: 33 4 99 62 33 38;E-mail: [email protected]

! 2010 Blackwell Publishing Ltd

Molecular Ecology Resources (2010) 10, 886–901 doi: 10.1111/j.1755-0998.2010.02882.x

unparalleled opportunities to provide insights into fun-damental issues in evolutionary biology, ecology andbiogeography, which could lead to improved manage-ment strategies (reviewed in Sax et al. 2005 and Cadotteet al. 2006). Species invasions indeed allow us to mea-sure many processes (e.g. dispersal and genetic change)that are difficult to study with long-established nativespecies or by performing field or laboratory experi-ments (Rice & Sax 2005). There is an urgent need formethods that can help in understanding invasive pro-cesses, such as identifying the biological and environ-mental parameters that favour the expansion of speciesacross a particular landscape.

Models and descriptions of the spatial dynamics ofinvading species have a long history. Early results wereobtained using diffusion or difference equation models(reviewed in Okubo & Levin 2001). A variety of otherclasses of models have subsequently been studied (e.g.individual-based models), showing that rates of expan-sion can be either linear or accelerating and that invasionfronts can be smooth or patchy depending on assump-tions about individual movements, demography, adapta-tion and environmental structure (reviewed in Hastinget al. 2005). So far, inference on invasive species dynamicshas focused on the expansion rate and the size of afavourable habitat using historical data (i.e. dates of firstobservation of the species over successive years trans-lated into occupied area; e.g. Easteal & Floyd 1986;Lubina & Levin 1988; Weber, 1998; Gilbert et al. 2004; seealso Pinhasi et al. 2005 for the use of radiocarbon datesfrom Neolithic sites to trace the origin and spread of agri-culture in Europe). Historical data may however beincomplete, inaccurate or nonexistent for the applicationof such treatments, and alternative ⁄ complementarymethods to infer the spatial dynamics of bioinvasions areneeded. Genetic data are potentially informative aboutthe origin and the movements of individuals, as well astheir density (Rousset 2004). Inferential methods for thedynamics of invasive species would therefore benefitfrom taking into account an ensemble of genetic, histori-cal and geographical information.

The spatial nature of expansion processes makes it dif-ficult however to define an explicit, realistic and usablemodel adapted to both genetic and historical data. Only afew attempts have been made in this direction (e.g.Estoup et al. 2004; Hamilton et al. 2005a,b; Ray et al. 2005;Francois et al. 2008). A drawback of former studies is thatthe spatial scale and hence the number of demes mod-elled to represent a continuous area of spatial expansionremained relatively arbitrary. For instance, in a coales-cent-based study, the spatial expansion of the cane toadin northern and eastern Australia has been simply mod-elled along a linear set of discrete demes connected bydirect migration each generation (Estoup et al. 2004).

Dates of first observation were however used to fix theage of sampled demes, thus giving information on thedynamics of the expansion process. The biological justifi-cation of such discrete population coalescent simulationapproaches may be questioned however because naturaldispersal is a continuous rather than a discrete process(Barton et al. 2002). A solution to this problem has beenproposed by S.J.E. Baird and F. Santos in a companionpaper also published in this special issue, allowing one toapproximate continuous dispersal on a discrete latticewhile providing results about coalescent times compati-ble with classical diffusion models.

It is very difficult to infer the demographic parametersof a spatial expansion from genetic (and historical) data,irrespective of the underlying spatial model. This isattributable to the complexity of the processes involved,which makes mathematical solution for the likelihood ofthe model very difficult. The approximate Bayesian com-putation (ABC) method has been developed to circum-vent this difficulty (Tavare et al. 1997; Pritchard et al.1999; Beaumont et al. 2002; Marjoram et al., 2003;Wegmann et al., 2009; Beaumont et al. 2010; Blum &Francois 2010). It substitutes an algorithmic solution(a simulation) for the explicit likelihood equation, usingsummary statistics to compare observations to simula-tion. The ABC method has already been applied to vari-ous problems in population genetics, as well as inepidemiology and palaeontology (e.g. Hamilton et al.2005a,b; Shriner et al., 2006; Tanaka et al. 2006; Hicker-son & Meyer 2008; Verdu et al. 2009). To apply ABCmethods in the particular context of a bioinvasion in acontinuous landscape, one needs to have a computerprogram that allows simulations of both genetic and his-torical data sets from prior distributions of parametersusing an appropriate model of spatial expansion. Here,we used a modified version of the program SPLATCHE(Currat et al. 2004), which can simulate both types ofdata under a model of 2D spatial expansion startingfrom a given location.

In this study, we have implemented SPLATCHE witha spatially explicit model of population expansion to sim-ulate both genetic and historical data during bioinvasionprocesses. As a test case, we used the invasion of north-ern Australia by the cane toad Bufo marinus. This speciesis by far the most successful of introduced amphibianspecies, and it has one of the most extensive documentedhistories of introduction of any vertebrate (reviewed inEasteal 1981). It is native to the American tropics and wasintroduced in Australia in 1935 as a biocontrol agent.Since then, the cane toad has spread across more thanone million km2, and its range is still expanding in north-ern Australia (Fig. 1). The ABC method was used to eval-uate the probability of alternative introduction scenariosdiffering by the date and location of first introduction

! 2010 Blackwell Publishing Ltd

RECONSTRUCTING BIOINVASION DYNAMICS 887

and to estimate relevant demographic parameters of theinvasive process. A unique advantage of our novelapproach is that we are able to contrast and combine twoseparate sources of information within the same analysis:historical information on dates and locations where toadswere first sighted and microsatellite genetic informationfrom contemporary samples. To evaluate the perfor-mance of our landscape-ABC method, we also compareresults against simulated data sets.

Materials and methods

Spatial model

Dispersal. We used a spatially explicit model of a rangeexpansion in a two-dimensional (2D) continuous andecologically homogeneous landscape, with dispersalscaled by the intergenerational (or birth–reproduction)distance. If the movement of an individual between birthand reproduction is a random-walk process made up of aseries of steps drawn from any arbitrary distribution,then its limiting distribution is Gaussian. Obviously, this

model will be poorest for organisms with only a few ifnot a single step of dispersal, especially when the under-lying distribution of steps has for example a high kurtosis(a high kurtosis distribution has a sharper peakand longer, fatter tails, while a low kurtosis distributionhas a more rounded peak and shorter thinner tails). Inagreement with this, empirical studies show that parent–offspring distributions often have high kurtosis (Rousset2004). However, repeated over generations, any such(non-Levy) distribution has a multi-generational convo-lution that tends to a Gaussian distribution (e.g. Baird &Santos’ paper, this special issue). With respect to thespecies more specifically studied here (B. marinus), bothmark-recapture and radiotracking studies have shownthat cane toads rapidly move away from the locationwhere they are captured (with nightly movements rang-ing from 0 to 1300 m) and show highly stochastic multi-step movement (Schwarzkopf & Alford 2002). Therefore,beside the abovementioned multi-generational convolu-tion argument, the Gaussian distribution with standarddeviation r km (the scale of movement) seems a goodcandidate in this species, even for intergenerational

(a)

(b)

Fig. 1 The range of the cane toad in Australia (a) and the sampled sites in the northern expansion area (b). In (a), the grey area corre-sponds to the distribution range in 1999. In (b), the red triangles are sampled sites with dates of first observation, and blue circles areplausible introduction sites. Because the expansion area is strongly constrained geographically and ecologically by the ocean in the northand desert zones in the south, the potential range was represented by a 1400 · 250 km rectangle of uniformly suitable habitat with reflec-tive edges (in red colour). N = Normanton, W = Westmoreland, B = Boroloola, T = Nathan river, R = Roper bar, M = McMinn station,D = Duck pond, K = Moroak station, E = Elsey station.

! 2010 Blackwell Publishing Ltd

888 A. ESTOUP ET AL .

movement, and was hence chosen for further treatments.Note that the kurtosis changes with X, the maximum dis-tance an individual can move in one generation. X ismeasured as a multiple of the scale of movement, i.e. themaximum distance is Xr km. We do not report kurtosisresults separately as the kurtosis is confounded with X,and X is more immediately interpretable.

Coalescent process in continuous space. Performing(backward in time) coalescent simulations in a 2D contin-uous space is difficult, because the random walks of twoindependent lineages in continuous space have zeroprobability of meeting and thus cannot coalesce. To avoidthis problem, Slatkin & Maddison (1990) suggested thattwo lineages automatically coalesce if they are closer thanan arbitrary small distance from each other, but the bio-logical justification of this procedure has been questioned(Barton et al. 2002). An alternative is to simulate randomwalks backward in time in a discrete space, where theyare certain to meet. This discrete space can be designedsuch that the properties of the random walks descendingfrom simulated coalescent events match the expectationsof a particular continuous space dispersal model. For aGaussian parent–offspring dispersal model, the resulting

spatial coalescent can, for example, approximate anextension of Wright’s neighbourhood size approach(Barton & Wilson 1995, 1996). We used a modification ofKimura’s stepping stone model (a discrete representationof space; Kimura & Weiss 1964) to produce lattices onwhich coalescence follows these expectations (Baird &Santos’s paper, this special issue) and Bayesian averagingbased on Buffon’s needle game (Buffon 1777; Solomon1978) to map our observations from a continuous fieldarea onto this discrete space reconstruction lattice (Baird& Santos’s paper, this special issue, and see Fig. 2 for anillustration). Only key features of the spatial model aredescribed here (see Baird & Santos’s paper, this specialissue, for details). The reconstruction lattice is a model ofdiscrete space. The coordinates of sampling localities, thedensity, the dispersal scale r and movement limit Xr aredefined over continuous space. Each of these in turnmust therefore be mapped from continuous to discretespace for comparison with the reconstruction. TilerDur-den (a program used for Buffon integration; Baird &Santos’s paper, this special issue) is used to sample aplacement of a lattice onto the field space (the origin Oand orientation h of the lattice with respect to the fieldspace are sampled from flat priors). Because the lattice

LagoonCreek

Normanton

LagoonCreek

Normanton

Fig. 2 Bayesian averaging over lattice placement. Here, we illustrate the Buffon’s needle game step of mapping our observations fromthe continuous field area of 1400 · 250 km detailed in Figure 1 onto a discrete space reconstruction lattice (see Baird & Santos’s paper,this special issue). Depending on the arbitrary choice of lattice origin and angle of orientation relative to the field area, two observationlocalities can fall in the same tile, face-neighbouring tiles, diagonally touching tiles or tiles with no contact. Each of these possibilities willhave different consequences for inference. Bayesian averaging over lattice placement is ensured by sampling from flat priors a latticeorigin and an orientation for each simulation performed during the first step of the ABC approach. The figure illustrates two instancesof lattice placement with tile spacing (Xr = 2.5 · 19 = 47.5 km). For a given lattice placement and tile spacing, sampling locality coordi-nates on the continuous Cartesian plane can be mapped to discrete stepping stone displacements on the lattice. Red arrows indicateexamples of migration between neighbouring tiles. Note that the tiling relationship between sampled sites changes depending on thearbitrary placement of the lattice. N = Normanton, W = Westmoreland, B = Boroloola, T = Nathan river, R = Roper bar, M = McMinnstation, D = Duck pond, K = Moroak station, E = Elsey station. Plausible introduction sites are in blue characters.

! 2010 Blackwell Publishing Ltd

RECONSTRUCTING BIOINVASION DYNAMICS 889

model is of nearest neighbour movement, the movementlimit Xr maps directly as the distance between tile cen-tres Xr on the lattice. For this lattice placement and tilesize {O, h, X}, TilerDurden maps each field coordinate ofthe sampling localities onto its closest node on the lattice.Each point in the spatial hypothesis space has threeparameters {q, r, X}; Xmaps onto the lattice spacing, so itremains to map the density q of individuals at carryingcapacity (per km2) and the per generation scale r of thedispersal distribution (units kilometer). Multiplying q bythe area of a tile (Xr)2 gives K, the number of (diploid)individuals in a tile at carrying capacity (i.e. maximumnumber of individuals). The scale of parent–offspringdispersal r relative to tile spacing Xr constrains the prob-ability m1 of movement between tiles in the most recentgeneration. Baird and Santos introduce a neighbourhoodsize approximation that allows finite bounds on X butrequires that m changes through time, asymptotingtowards the past (see Baird & Santos’s paper, this specialissue). The changing values of m, given the initial valuem1, are calculated using the m Vector function providedby Baird & Santos.

Simulations

Demographic simulations. The computer packageSPLATCHE (Currat et al. 2004) was modified to simulatethe dates of first observation as well as the genetic diver-sity of samples on a lattice, following the diffusion expec-tation in a continuous space. SPLATCHE producesforward time discrete generation stochastic simulationsfor demographic growth on a square tiling of the plane.Each tile undergoes independent population growth.Individuals are exchanged among nearest neighbourtiles. The population growth within each tile follows adiscrete version of logistic regulation of the formNt+1 = Nt(1 + r) ⁄ (1 + rNt ⁄K), where K is N at the carryingcapacity and r is the growth rate. The expected numberof emigrants M at time t from a given tile was computed,for each generation, as Mt = mtNt, where mt is the proba-bility of moving between tiles and Nt the number of indi-vidual in the tile at time t, and the realized number ofemigrants followed a Poisson distribution with mean Mt

(i.e. model 3 in Currat et al. 2004). A multinomial distri-bution with equal probabilities for the four neighbouringpatches was then used to distribute the emigrants to thefour neighbouring tiles. Given the close similaritybetween the SPLATCHE heuristic and a reconstructionlattice parameterized by {K, mt, X}, a few modificationswere necessary to use these simulations to provide thedemographic component of the model. For a givenhypothesis about the continuous field area {q, r, X}, itwas only necessary to (i) sample a lattice placement {O, h}on the field area, (ii) sample a tile size Xr, (iii) map the

sampling localities onto the lattice specified by {O, h, X}(see illustration in Fig. 2), (iv) given X, calculate K andlook up the appropriate values for mt and (v) ensure thatr, K and mt could be passed as parameters to SPLATCHE.During demographic simulations, the dates of arrival ofindividuals into new tiles were recorded, assuming ageneration time of one year (Estoup et al. 2004).

Genetic simulations. SPLATCHE performs backwardtime coalescent simulations conditioned on the informa-tion stored during forward time demographic simula-tions, reconstructing the genealogies of neutral genes forthe simulated demography and migration history leadingto the tiles where sampled genes are located in the pres-ent. Therefore, for a given continuous field area hypothe-sis {q, r, X} and forward time demographic realizationfor {r, { K, mt, X}}, lineage movements were traced back-ward stochastically according to the stored Mt and Nt

values, and lineages in the same tile were allowed to coa-lesce with probability 1 ⁄ (2Nt), where Nt is the storednumber of diploid individuals placed in the tile duringthe demographic realization. Compared to the publishedversion of SPLATCHE (Currat et al. 2004), severalenhancements were added. The coalescent model wasmodified to follow that specified in the programSIMCOAL 2.1 (Laval & Excoffier 2004). We thus imple-mented the possibility of multiple coalescence events pergeneration, a generalized mutation model for microsatel-lite markers (GSM; Estoup et al. 2002) and the possibilityof different mutation rates at different loci. Microsatelliteallele size constraints were included in our simulationsby imposing reflecting boundaries at the edge of an allelesize range of 30 contiguous allelic states (Pollock et al.1998). This size range is consistent with empirical data onrepeat numbers at microsatellites in various species(reviewed in Estoup et al. 2002).

Parameters. Our model includes eight demographic andtwo genetic parameters, each parameter having a uni-form prior distribution used for Approximate BayesianComputations (Table 1). The demographic parameters ofthe spatial expansion process are the effective density ofindividuals at carrying capacity (q individuals per km2),the standard deviation of dispersal distance (r, in km),the local population growth rate (r), the effective numberof founders at the time of the initial introduction (NF),the density threshold for detecting emerging populationsduring the expansion process (DS individuals per km2)and the maximum individual movement (X, measured inunits of r). The coalescence rate in the ancestral sourcepopulation is specified by two parameters: the effectivenumber of individuals during a bottleneck period (arbi-trarily fixed to 20 generations), which is likely to occur inserially introduced species such as the cane toad (Easteal

! 2010 Blackwell Publishing Ltd

890 A. ESTOUP ET AL .

1981) before the introduction in the studied area (NB),and the effective number of individuals before the bottle-neck period (NA). We also recorded and estimated thecomposite parameters BR = NA ⁄NB, which representsthe intensity of bottleneck before the introduction, andthe invasion speed at equilibrium a ! 2

!!!!!!!!!!!!!0:5rr2

p, which

represents the radial increase in km per generation ofinvasion range at dynamic equilibrium assuming aGaussian distribution for intergenerational movementand a logistic regulation of population growth (e.g. equa-tion 15 in Kot et al. 1996).

The maximum individual movement Xr has a non-negligible effect on at least some of the statistics used tosummarize historical and genetic data, and hence, on ourinferences (results not shown). This is because the ratio ofXr to r specifies movement probabilities on the lattice.Changing X not only changes the bounds of the modelledparent–offspring distribution, but also its peakednessrelative to the tails of the distribution (kurtosis; Baird &Santos’s paper, this special issue). In the particularcontext of spatial expansion, different X values mayreflect different intensities of Allee effect (the tail of thedispersal distance distribution for a species characterizedby strong Allee effect being thinner and hence corre-sponding to smaller X values), an ecological featureknown for significantly interfering with the speed of theexpansion process (Hasting et al. 2005). We represent ouruncertainty about kurtosis ⁄maximum distance as a uni-form prior on X of [1.585, 3.545] r. These bounds are thewidest ones possible under the neighbourhood sizeapproximation given that all migration probabilities mustlie between 0 and 1 (Baird & Santos’s paper, this specialissue).

With regard to mutation parameters, the averagemutation rate across loci (l) was set to 6.2 · 10)4

(Dib et al. 1996) and individual locus mutation rates liwere sampled from a Gamma (2; 2=l) distribution ofmean l (Estoup et al. 2001). The average coefficients ofthe geometric distribution of mutation sizes for micro-satellite markers (P) were set at 0.22 (Dib et al. 1996) andindividual locus coefficients Pi were sampled from anexponential distribution of mean P (Estoup et al. 2001).

The general algorithm that lists all the steps of the esti-mation process including the steps specific to Baird andSantos’s spatial model is given in Appendix.

Application to the cane toad invasion

We applied our methodological framework to the case ofthe recent introduction of the cane toad in northern Aus-tralia (see Estoup et al. 2004 for a thorough description ofthis introduction).

Simulated scenarios. We defined two introduction sce-narios to match two competing hypothesis about the ini-tial site and date of introduction in northern Australia(Fig. 1). Scenario 1 assumes that cane toads were intro-duced near Lagoon Creek with first observation in 1979(Freeland &Martin 1985), while scenario 2 postulates thatthe spatial expansion started near Normanton with firstobservation in 1964 (Easteal & Floyd 1986). The ecologicalconditions have remained relatively homogeneous overtime and space in the invaded area, which was isolatedfrom the main core of cane toad range expansion in theEast for at least 15 years (Easteal & Floyd 1986). Localcarrying capacity and population growth rate were there-fore assumed constant during the bioinvasion process.Because the expansion area is strongly constrained geo-graphically and ecologically by the ocean in the northand desert zones in the south, the potential range was

Table 1 Parameters of interest and prior distributions

Parameter Symbol Prior distribution

Density of diploid individuals at carrying capacity (individuals per km2) q Uniform [20, 2000]Standard deviation of dispersal distance per generation (km) r Uniform [5, 30]Local population growth rate r Uniform [0.1, 10]Initial number of effective founders introduced NF Uniform [2, 50]Density threshold for detecting emerging populations (individuals per km2) DS Uniform [0.05, 5]Maximum distance an individual can move per generation (measured in units of r) X Uniform [1.585, 3.545]Effective number of individuals during a bottleneck period (arbitrarily fixed to 20 generations)before introduction

NB Uniform [2, 100]

Effective number of individuals before the bottleneck period NA Uniform [200, 20000]

We also recorded and estimated the composite parameter BR = NA ⁄NB, which represents the intensity of bottleneck before the introduc-tion, and a ! 2

!!!!!!!!!!!!!0:5rr2

p, which represents the invasion speed at equilibrium corresponding to the radial increase in km per generation of

invasion range at dynamic equilibrium (Kot et al. 1996).The average mutation rate across loci (l) was set to 6.2 · 10)4 (Dib et al. 1996),and individual locus mutation rates li were sampled from a Gamma(2; 2=l) distribution of mean l (Estoup et al. 2001). The averagecoefficients of the geometric distribution of mutation sizes for microsatellite markers (P) were set to 0.22 (Dib et al. 1996), and individuallocus coefficients Pi were sampled from an exponential distribution of mean (Estoup et al. 2001).

! 2010 Blackwell Publishing Ltd

RECONSTRUCTING BIOINVASION DYNAMICS 891

represented by a 1400 · 250 km rectangle of uniformlysuitable habitat with reflective edges (i.e. individuals arereturned to their origin when they disperse further thanboundaries).

Samples. Nine localities were sampled (30 adults persample) in June 1999 along an approximately linear tran-sect (Fig. 1) and analysed at ten nuclear microsatellite loci(Estoup et al. 2004). The dates of first observation wereknown for all nine localities (Fig. 1; reviewed in Estoupet al. 2004). The simulated spatial expansions werestopped at the sampling date (i.e. year 1999). Thosedemographic simulations for which any of the sampledlocalities was not colonized at sampling time were dis-carded. In Bayesian terms, this filtering of the data giveszero posterior probability to the parameter values lead-ing to rejected data (Tanaka et al. 2006).

Summary statistics. We extracted several summary sta-tistics from the data to be able to compare introductionscenarios and estimate parameters under the ABC frame-work described below. Five types of summary statisticswere computed on the actual cane toad data set as wellas on all simulated data sets: the dates of first observationfor each sampled locality (except for the initial site ofintroduction), the mean number of alleles per locus andper locality sample, the mean expected heterozygosity(Nei 1987), the mean ratio of the number of alleles overthe range of allelic sizes in base pairs (Excoffier et al.2005) and Fst values between all pairs of locality samples(Weir & Cockerham 1984). We thus had a total of 71 sum-mary statistics, eight for historical data and 63 for geneticdata. The values of the summary statistics taken over theactual cane toad data set are given in Table S5 for geneticdata and in Fig. 1 for dates of first observation.

Historical data. The simulated dates of first observationat sampled locations were taken from SPLATCHE whenthe density of individuals within a newly colonized tilebecame larger than the density threshold parameter DS(individuals per km2). Cane toads are usually detectedon roads at night and reported by the general public.Adult toads are large and highly visible, and publicawareness is high. Sensible DS values are hence likely tobe low. We therefore represented our uncertainty aboutthe threshold with a uniform prior on [0.05, 5] effectiveindividuals per km2 (Table 1). For the same reason thatdates of first observation depend on toad density, there isuncertainty associated with the reported dates of initialintroduction. We assumed that these reports were subjectto the same density threshold. The demographic model isof logistic growth rate r from NF founder individuals. Fora given rate and threshold, it takes some generations(years) for founders to achieve sufficient density to be

reported. The date of initial introduction in the invadedarea was calculated as the date of first observation at theinitial introduction site (1979 in Lagoon Creek for sce-nario 1 or 1964 in Normanton for scenario 2) minus thisnumber of generations.

Approximate bayesian computation

In such a complex model of bioinvasion, it is difficult toexpress the likelihood in mathematically explicit form.To overcome this difficulty, we estimated parametersand their posterior distribution through an ABC basedon summary statistics (instead of on the full data) asdescribed in Beaumont et al. 2002;. Briefly, the ABCapproach involves three steps. The first step consists ofsimulating many data sets with characteristics similar tothe observed data set (same number of samples, samenumber of individuals per sample, same number of loci,same geographical location of sampled sites) usingparameter values drawn from prior distributions (asdefined in Table 1). The simulation outputs and theirassociated parameters are stored in a reference file. Thesecond step consists of comparing the simulations to theobservations by means of summary statistics such asthose described above and discarding those simulationsthat are very different from the observations. The differ-ence between sets of statistics is computed as the Euclid-ean distance (d) between them. The nr simulationoutcomes with the smallest Euclidean distance from theobservations are retained. An nr value of 40 000 was cho-sen for all our analyses because we found that, in keepingwith Beaumont et al. 2002; the ABC method performedsimilarly for nr values between 4000 and 80 000 (resultsnot shown). The third step is the estimation of the param-eters by a local linear regression of parameters onsummary statistics. All demographic parameters werelog-transformed prior to the regression and reversed-transformed to obtain posterior densities on the originalscale (Estoup et al. 2004). Similar estimation results wereobtained when using a weighted Euclidian distance asproposed in Hamilton et al. (2005a) and ⁄ or a log-tangenttransformation of parameters as proposed in Hamiltonet al. (2005b) (results not shown). Because a substantialfraction of simulations was rejected as contradicting theknown history of colonization (the filtering mentionedabove), around 4 · 106 simulations were necessary (withslight variation depending on the scenario and prior dis-tributions) to obtain a reference data set of 2 · 106 simu-lated data sets. This ABC step 1 required a single day ofcomputation on a cluster with 30 2-GHz nodes. In com-parison, steps 2 and 3 took only a few minutes on a singlecomputer.

The programs that have been developed or modifiedto simulate historical and genetic data and for the

! 2010 Blackwell Publishing Ltd

892 A. ESTOUP ET AL .

estimation of parameters and posterior probability of sce-narios are available (Linux version) from A.E. uponrequest. The unmodified SPLATCHE program is avail-able on http://cmpg.unibe.ch/software/splatche. TheTilerDurden program and m Vector function are avail-able from Baird and Santos, this special issue.

Comparison of introduction scenarios

The ABC framework was used to discriminate betweenour two statistical models (i.e. the combination of thesimulated scenarios and the prior distributions on modelparameters). The simulated scenarios corresponded tothe invasion scenarios, which differed by the geographi-cal location and date of the initial introduction site (Figs 1and 2). As the priors on parameters do not differ betweenthe statistical models, for simplicity sake, we will hereaf-ter refer simply to the comparison of scenarios. The priorprobability of each scenario was set to 1 ⁄ 2, and their pos-terior probabilities were simply computed as the propor-tion of simulations performed under scenario 1 and 2that were represented within the nd simulations withsmallest associated Euclidean distances (the so-calleddirect method mentioned in Cornuet et al. 2008; seeFagundes et al. 2007; Toni et al. 2009 and Leuenberger &Wegmann 2010 for more sophisticated methods to per-form model choice in an ABC framework). We computedthe posterior probability of each for nd ranging from 10 to1000, to see how it varied with nd (see Fig. 3). With thedirect method, the precision of the posterior probabilityestimation of scenarios is expected to decrease whenEuclidean distance values d and hence nd increase (e.g.Guillemaud et al. 2010). On the other hand, a large vari-

ance of those estimations is also expected when thenumber of d values retained (nd) is too small. Comparisonof the bioinvasion scenarios was made before estimatingposterior densities for demographic parameters, so thatwe only detail estimation of parameters from the mostlikely of our two scenarios.

Performance of the method

Computations dealing with the evaluation of the perfor-mance of the ABC method studied here are verytime-consuming. Although useful, the exploration of thecomplete multidimensional parameter space (as defineby priors and scenario) is hence not an easy task. More-over, such prior-based exploration only provides an aver-aged picture of the performance of the methods over alarge multidimensional parameter space, neglectingareas of particular interest (i.e. those that fit the real dataset analysed). We have hence chosen to restrict our per-formance analyses to the parameter space of interest withrespect to the real cane toad data set (i.e. the most likelyscenario 1 with fixed parameter values chosen to be closeto the best estimates for that scenario; see Results section).We performed ABC estimations on 500 simulated testdata sets to obtain an estimate of the distribution of theposterior probability of the chosen scenario (i.e. scenario1) relatively to scenario 2. We then measured the numberof times the posterior probability for scenario 1 waslarger than 0.95. We also computed the median of theestimated posterior distribution of the parameters underscenario 1, which was then used as a point estimate tocompute, for each parameter, the relative bias and therelative Root Mean Square Error. Finally, the perfor-mance of the methods was assessed for data sets differingby their information content (genetic or historical infor-mation). We thus considered G, H and GH data setsincluding genetic, historical and both data, respectively.

Robustness of the method

We assayed the robustness of our inferences on both sce-nario comparison and parameter estimation by produc-ing additional reference files, (i) assuming a LogUniform[0.05, 20] distribution (instead of a Uniform [0.05, 5]) asprior on the density threshold for detecting new popula-tions during the expansion process, (ii) considering analternative very large potential range of expansion (i.e. a14 000 · 2500 km rectangle), hence eliminating any pos-sibility of the invasion reaching the edges over the time-scale considered and (iii) allowing the mutationparameters l and P to vary (instead of being fixed to6.2 · 10)4 and 0.22, respectively) following uniform dis-tributions in [10)4, 10)3] and [0.05, 0.35], respectively.Finally, scenario comparison and parameter estimation

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 100 200 300 400 500 600 700 800 900 1000

Post

erio

r pro

babi

lity

of s

cena

rio

1 H

GH

G

Fig. 3 Posterior probability of an introduction of northern Aus-tralian cane toads in Lagoon Creek (scenario 1) vs. Normanton(scenario 2). nd is the number of data sets with the smallestEuclidian distances used to estimate the posterior probability ofeach introduction scenario (see Materials and Methods sectionfor details). Reference files were obtained using the set of priordistributions given in Table 1. Results are reported for the caseswhere one uses information on both genetic and historical data(GH), genetic data alone (G) or historical data alone (H).

! 2010 Blackwell Publishing Ltd

RECONSTRUCTING BIOINVASION DYNAMICS 893

were also assayed using reference files with 2 · 105 datasets (instead of 2 · 106).

Results

Comparison of alternative introduction scenarios

A hitherto unanswered question concerning the invasionof northern Australia by the cane toad is whether the cur-rent population stems from an introduction near LagoonCreek with a first observation in 1979 (Freeland & Martin1985) or an earlier introduction in 1964 further east nearNormanton (Easteal & Floyd, 1986). Accordingly, wemodelled a different scenario for each situation: scenario 1starting invasion at Lagoon Creek in 1979 and scenario 2at Normanton in 1964 (see Fig. 1 and 2). Taking the twoscenarios as an additional parameter with uniform priorin the ABC, we found that scenario 1 has the highest pos-terior probability for all data sets and that this conclusionis unchanged within a reasonable range of the number ofdata sets ‘closest’ to the observed data set (nd = 10–1000;Fig. 3). Using historical data (i.e. dates of first sight at dif-ferent sites) alone or in combination with microsatellitegenetic data from contemporary samples, scenario 1 hasa posterior probability >98% for nd = 100 and larger than90% for nd = 1000. This result strongly contradicts an ini-tial introduction of cane toads at Normanton being themain source of toads in the studied area despite an earlierfirst observation. Note however that the posterior proba-bilities of the two scenarios are less differentiated whenusing genetic data only (75% and 64% for scenario 1 fornd = 100 and 1000, respectively), suggesting that geneticsalone has a lower power to discriminate between our twoscenarios. Similar results were obtained using a differentprior on the density threshold for detecting emergingpopulations during the expansion process, differentpriors for genetic parameters and considering a verylarge potential range of expansion hence eliminatingedge effects (Table S1).

The power of our methodology to discriminatebetween scenarios is shown in Fig. 4. This simulationstudy confirms the good performance of our approachfor determining the correct introduction site and datewhen using genetic and historical information (GH) orjust historical data (H). The posterior probability of the‘true’ simulated scenario was larger than that of the‘false’ scenario for all simulated GH and H data sets, andit was larger than 0.95 in 38% (mean = 0.920) and 99%(mean = 0.999) of the simulated cases for data sets of typeGH and H, respectively. It also appears that the use ofgenetic data alone (G) considerably reduces the power toidentify the correct scenario. Although the posteriorprobability of the ‘true’ simulated scenario was largerthan that of the ‘false’ scenario for 82% of the simulated

data sets, this posterior probability was larger than 0.95in <0.2% of the cases (mean = 0.584).

Estimation of parameters

The relevant parameters of the invasive process studiedhere with their prior distributions are detailed in Table 1.The posterior distributions of those parameters inferredunder the scenario with the highest statistical support (i.e.scenario 1) are presented in Fig. 5, and point estimates arelisted in Table 2. Only combining genetic and historicalinformation (data set GH) allows concomitant inferenceon most demographic parameters of our model. For themost informative data set GH, we find evidence for amarked bottleneck before the introduction (BR = 845,90% CI: 123–2727), a small initial founding propagule(NF) of 14.1 individuals (90% CI: 3.8–31.6), a large popula-tion growth rate (r) of 360% (90% CI: 130–760%), anestimated (median) value of the standard deviation ofdispersal distance (r) of 18.9 km (90% CI: 14.1–24.5) andan invasion speed at equilibrium (a) of 49.3 km per year(90% CI: 35.2–66.7). The maximum individual movement(X) tends towards values around 2.5 r per generation oraround 50 km, that is, the speed of invasion is only lim-ited by the maximum individual movement (because thegrowth rate is so high). The posterior distribution curvesdo not differ noticeably from the priors for the density atcarrying capacity (q) and the density threshold for detect-ing emerging populations (DS), indicating that the datado not contain information about these parameters for themodel considered. Inferences based on genetic data alone(data set G) support larger values for r, r and a andslightly lower values for NF. Historical data alone (dataset H) gave similar estimations than the data set GH for r,r and a. However, the posterior density curves obtainedwith data set H did not differ noticeably from the priorsfor NF and BR, indicating that historical data are notinformative for these parameters.

Similar estimations were obtained using a differentprior on the density threshold for detecting emergingpopulations, different priors for genetic parameters andwhen eliminating any edge effects (Table S2). On theother hand, estimations of r, a and NF differed consider-ably when assuming the less supported scenario 2(Table S2). Under this scenario, inferences support lowervalues for r (median = 0.9) and a (median = 22.2) and lar-ger values for NF (median = 22.0). This result underlinesthe importance of discriminating between different intro-duction scenarios before inferring parameters.

Accuracy of parameter estimation

Results on the accuracy of estimations under scenario 1are presented in Table 3. This simulation study shows

! 2010 Blackwell Publishing Ltd

894 A. ESTOUP ET AL .

that, with data set GH, estimations are precise for thedemographic parameters r, r and a because their relativebias is smaller than 5% and their relative Root MeanSquare Error (RMSE) is smaller than 0.25. The parameterX seems well estimated, but in fact its posterior differs lit-tle from the prior, which is confirmed by direct observa-tion of posterior distribution shape (results not shown).A lower accuracy was obtained for NF (bias = 0.480,RMSE = 0.700), BR (bias = 0.039, RMSE = 0.520) and DS(bias = )0.335, RMSE = 0.348). A large bias of 8.781 and aRMSE of 9.017 illustrate the poor accuracy of the estima-tion of q.

Analyses based on genetic data only (data set G) showaccuracy estimators lower than the data set GH for allparameters except NF and BR (Table 3). The bias and theRMSE values increased considerably for r (from 0.037 to0.175 and from 0.096 to 0.449, respectively), r (from 0.010to 0.406 and from 0.249 to 0.449, respectively) and a (from0.001 to 0.280 and from 0.112 to 0.305, respectively). Anal-ysis based on historical data only (data set H) indicates

a similar accuracy of estimation for the parameters r, rand a (with however a larger bias for r and a) when com-pared to the data set GH. Considerably larger bias andRMSE values were obtained for NF and BR.

Discussion

The landscape-ABC method presented here is a genericmodel–based method that allows inference to be madeon the introduction and spatial expansion dynamics of aspecies, using a combination of genetic, historical andgeographical data. A major advantage of the method isthat it can easily include various types and richness ofgenetic and historical data in a spatially explicit context.Its Bayesian foundation provides the opportunity forassimilation of expert information on quantities of inter-est from external sources through prior distributions. Themethod can be applied to discriminate between introduc-tion scenarios and to make inference on key demographicparameters for various situations of 1D or 2D spatial

Dataset GH

Posterior probability(scenario 1)

Cou

nts

over

500

sim

ulat

ed fi

les

050

100

150

200

Dataset H

Posterior probability(scenario 1)

Cou

nts

over

500

sim

ulat

ed fi

les

010

020

030

040

050

0

Dataset G

Posterior probability(scenario 1)

Cou

nts

over

500

sim

ulat

ed fi

les

0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8

600

2040

8010

0

Fig. 4 Power of the landscape-ABC method to distinguish between introduction scenarios. The posterior probabilities of scenarios 1 and2 were computed using the nd = 100 ‘best’ Euclidian distances between each of 500 test data sets simulated under scenario 1 and the ref-erence files produced from the set of prior distributions given in Table 1. The 500 test data sets have been simulated with parameter val-ues fixed at q = 100, r = 19, r = 4, NF = 15, BR = 667 (with NA = 10 000 and NB = 15), DS = 2 and X = 2.5 (see Table 1 for parameterdefinition). Information used for inferences include both genetic and historical data (GH), genetic data (G) and historical data (H). Notethe different scales (counts) of the y-axes.

! 2010 Blackwell Publishing Ltd

RECONSTRUCTING BIOINVASION DYNAMICS 895

expansions. Such situations include recent expansionstypical of biological invasions as well as more ancientexpansions, such as the colonization of a continent byearly modern humans. However, because the productionof reference files is computationally demanding, themethod may be difficult to apply for ancient and large-scale spatial expansions, except if the number of simu-lated data sets produced to obtain the references file isconsiderably reduced. Focusing on the present casestudy, we ran additional simulations to evaluate the per-formance of the method in making inferences over sce-narios and parameters using reference files of 2 · 105

data sets instead of 2 · 106 (keeping the same relativenumber of accepted data sets). We observed only a slightdecrease in the performance of the method for referencefiles of 2 · 105 data sets, suggesting that computationtime could be reduced by a factor 10 while maintaining arelatively similar level of performance (Tables S1–S4).The effect of a strong reduction in the number of simula-

tions on the level of uncertainty of parameter estimation(i.e. width of confidence intervals) remains to be furtherassessed, however, using a measure of the accuracybased on the whole posterior distribution. Moreover, thepotential benefit of recently published improvements onthe standard ABC approach (e.g. ABC-MCMC or ABCpopulation Monte Carlo; Wegmann et al. 2009 and Beau-mont et al. 2010, respectively) in the context of the com-plex spatial model studied here still needs to be assessed.

Performance and robustness of method

The performance of the landscape-ABC method wasassessed for data sets that differed by their informationcontent (i.e. data sets GH, G and H), allowing compari-sons to be made between the information carried bygenetic and historical data. Our results underline theimportance of historical data when aiming at discriminat-ing between introduction scenarios. The discriminating

0 1000 2000

0 e+

004

e–04

8 e-

04

Den

sity

5 15 25

0.00

0.04

0.08

0.12

Den

sity

0 4 8

0.00

0.00

0.05

0.10

0.15

0.20

0.25

Den

sity

0.00

0.01

0.02

0.03

0.04

0.05

20 80 140

Den

sity

10 30 50

Den

sity

Den

sity

r

Den

sity

Den

sity

0.00

0.02

0.04

0.06

0 2000 4000

0.00

000.

0005

0.00

100.

0015

0.00

20

0 2 4

0.10

0.20

0.30

0.0

0.2

0.4

0.6

0.8

2.0 3.0

XNF BR DS

Fig. 5 Posterior density of parameters for the cane toad invasion of northern Australia. The posterior densities were obtained underthe most likely introduction scenario (i.e. scenario 1; Fig. 1) and using the reference file produced from the set of prior distributions givenin Table 1. Information used for inferences includes both genetic and historical data (red curve), genetic data (blue) or historicaldata (black). Parameter definitions are the same as in Table 1. All prior and posterior densities are based on samples of 100 000 and40 000 values, respectively. Dotted lines represent prior distributions. BR = NA ⁄NB and a ! 2

!!!!!!!!!!!!!0:5rr2

pare composite parameters and are

therefore associated with nonuniform prior distributions.

! 2010 Blackwell Publishing Ltd

896 A. ESTOUP ET AL .

power was low when using genetic data alone, at leastfor the B. marinus invasion considered here (i.e. low levelof polymorphism and weak genetic structure). We foundthat even a small number of dates of first observationconsiderably improved our ability to discriminatebetween scenarios (results not shown). With regard toinference over demographic parameters, our perfor-mance study underlines the importance of consideringboth genetic and historical information for making con-comitant inferences on all demographic parameters ofour introduction-spread model. Genetic data alone pro-vide less accurate estimation of dispersal distance pergeneration, population growth rate and invasion speed at

equilibrium, whereas historical data alone were not infor-mative on the occurrence of a bottleneck before the intro-duction and the size of initial founding propagule. Bothreal and simulated data indicate that historical andgenetic data complement without obvious antagonisticinteractions in terms of estimation. In agreement withprevious studies (Beaumont 2008; Cornuet et al. 2008;Guillemaud et al. 2010), the discrimination power glob-ally increased for GH, G or H data sets when using amore computer time–consuming method based on logis-tic regression to estimate the posterior probabilities ofscenarios instead of the direct method (results notshown).

Whatever the data set considered (GH, G or H), thedata contained very little information on the density atcarrying capacity, q. This result is not surprising withregard to historical data, as this parameter is known tohave little influence on the spatial expansion dynamics

Table 2 Estimated parameters for the invasion of cane toad innorthern Australia using the landscape-ABCmethod

Mean Mode Median Q5% Q95%

q Prior 1010 IR 1010 120 1900Posterior GH 982 IR 958 105 1942Posterior G 1030 IR 1001 105 1996Posterior H 998 IR 997 111 1889

r Prior 17.5 IR 17.5 6.3 28.8Posterior GH 19.0 18.5 18.9 14.1 24.5Posterior G 21.9 21.4 21.8 15.0 28.9Posterior H 18.5 17.6 18.2 13.1 24.7

r Prior 5.0 IR 5.0 0.6 9.5Posterior GH 3.9 3.1 3.6 1.3 7.6Posterior G 5.6 5.7 5.6 1.5 9.8Posterior H 4.0 2.6 3.4 1.2 8.3

a Prior 52.7 39.6 48.1 13.3 106.8Posterior GH 49.9 48.5 49.3 35.2 66.7Posterior G 70.0 60.5 67.1 41.2 108.7Posterior H 47.5 45.5 46.7 35.7 61.6

NF Prior 26.0 IR 26.0 4.4 47.6Posterior GH 15.5 11.2 14.1 3.8 31.6Posterior G 12.2 8.7 10.3 2.6 27.1Posterior H 26.3 IR 26.4 4.8 47.2

BR Prior 404 177 199 23 1487Posterior GH 1062 706 845 123 2727Posterior G 1020 702 823 111 2596Posterior H 400 172 196 23 1476

DS Prior 2.5 IR 2.5 0.3 4.8Posterior GH 2.9 IR 2.8 0.6 4.9Posterior G 2.5 IR 2.5 0.3 4.6Posterior H 2.7 IR 2.7 0.7 4.8

X Prior 2.6 IR 2.6 1.6 3.4Posterior GH 2.5 2.3 2.4 1.7 3.3Posterior G 2.5 2.4 2.5 1.7 3.3Posterior H 2.6 2.3 2.6 1.7 3.3

The estimates were obtained under scenario 1 (introduction inLagoon Creek) using information including both genetic andhistorical data (GH), genetic data (G), or historical data (H).Parameter and prior definitions are the same as in Table 1.Mean, mode and quantile values are estimated from samples of100 000 and 40 000 values for priors and posteriors, respectively.IR: irrelevant.

Table 3 Performance of the landscape-ABC method for inferringparameters in the B. marinus invasive process

Data set Parameters Bias RMSE

GH q 8.727 8.781r 0.037 0.096r 0.010 0.249NF 0.480 0.700a 0.001 0.112BR 0.039 0.520DS )0.335 0.348X 0.028 0.086

G q 8.509 8.550r 0.175 0.192r 0.406 0.449NF 0.263 0.531a 0.280 0.305BR 0.038 0.511DS )0.381 0.385X 0.059 0.081

H q 9.020 9.017r 0.029 0.095r )0.120 0.251NF 0.801 0.808a )0.069 0.112BR )0.702 0.705DS )0.326 0.327X 0.072 0.111

Bias and RMSE values were computed from 500 test data setssimulated under scenario 1 (introduction in Lagoon Creek) withparameter values fixed at q = 100, r = 19, r = 4, NF = 15,BR = 667 (with NA = 10 000 andNB = 15), DS = 2 and X = 2.5.For each test data set, median values of parameters have beenestimated using the reference file produced under scenario 1from the set of prior distributions given in Table 1. Informationused for inferences include both genetic and historical data(GH), genetic data (G) and historical data (H). Parameter andprior definitions are the same as in Table 1.

! 2010 Blackwell Publishing Ltd

RECONSTRUCTING BIOINVASION DYNAMICS 897

and hence on dates of first observation when q values aresufficiently large (e.g. Okubo & Levin, 2001). We couldhowever have reasonably expected that different q valueswould translate into different level of genetic drift oncepopulations have reached carrying capacity, suggestingthat genetic data would be informative on q. However,the time scale of the invasion process studied here (i.e. afew tens generations) is probably too short to allow dif-ferent q values to translate into different levels of geneticdrift, so that effects of drift are actually controlled by thepopulation growth rate and the initial number of foun-ders. One is unlikely to be able to obtain information on qwhen analysing recent bioinvasion histories with rapidgrowth, and it remains to be tested whether this wouldbe different when considering older invasion scenarios(e.g. the colonization of a continent by early modernhumans).

Our performance study based on simulated data setswas structured to match the features of the spatial expan-sion of B. marinus in northern Australia. Conclusionsshould hence be restricted to this particular situation andwould probably differ for different sampling designs andinvasion histories (number and location of informativesites, number and level of polymorphism of loci, date ofintroduction, dispersal and growth rate features, etc.).Because the production of reference files requires muchcomputer-processing time, we could not thoroughlyexplore here the impact of various sampling designs andinvasion histories. For similar reasons, we restricted ourrobustness study to changes in the prior for the densitythreshold for detecting emerging populations, in thepriors for genetic parameters and on the size of potentialrange of expansion to test for the impact of edge effectson our inferences. A density threshold for detection isdifficult to specify with any certainty as it depends onnumerous factors and edge effects for realistic rangeboundaries are difficult to model. It is hence fortunatethat our results indicate good resilience to such changes.

Spatial expansion of the cane toad

The demographic and spatial model assumed in the pres-ent study is substantially different and involves differentparameters than that assumed in a previous analysis ofcane toad population expansion (Estoup et al. 2004), pre-venting direct comparisons of our results with thoseobtained in that study. In Estoup et al.’s study (2004), thedemographic models included a linear series of discretedemes founded at each generation. The number of mod-elled demes in the area of spatial expansion was hencerelatively arbitrary. Moreover, although strong geo-graphical constraints exist on the considered expansionarea, a 1D model was less realistic than the 2D modelused here. In the present study, we overcame these

problems by considering a 2D lattice model in which thenumber of cells is scaled by the value of the standarddeviation of dispersal distances per generation (r). InEstoup et al.’s study (2004), the parameters of interestwere the stable effective population size (reached in asingle generation after foundation), the effective numberof founders at each generation and the migration ratebetween adjacent subpopulations. The parameters in thepresent study (initial number of introduced individuals,population growth rate, density at carrying capacity anddispersal distance per generation) are biologically moresensible and are a better reflection of the idea of an intro-duction followed by a spatial expansion in a 2D continu-ous and ecologically homogeneous landscape.

The strong support for the occurrence of intensebottleneck event(s) prior to introduction is in agreementwith the well-documented serial intentional introduc-tions of a limited number of cane toads in severalCaribbean and Pacific islands before the species wasintroduced in Australia (Easteal 1981; Estoup et al. 2001).An initial number of effective founders as low as ten indi-viduals at lagoon Creek may actually represent a consid-erably larger number of living individuals. This isbecause census: effective population size ratios can bevery high in anurans (approximately 100:1), especiallyfor prolific species such as the cane toad (e.g. Scribneret al. 1997). The high population growth rate inferredhere (around 400%) is in agreement with the fact thatB. marinus is a prolific species (7500–20 000 eggs ⁄ female;Alford et al. 1995) and that a sudden short-term popula-tion explosion has been documented in most newlyfounded populations (Easteal 1981; personal observa-tions). The large standard deviation of dispersaldistances inferred (around 19 km per generation) coin-cides with the nomadic behaviour of B. marinus evi-denced by mark-recapture and radiotracking studies(Schwarzkopf & Alford 2002; Phillips et al. 2007), as wellas with the high capability of the species for rapid coloni-zation of large areas (e.g. Easteal et al. 1985). The largeinvasion speed at equilibrium (around 50 km per year)compares favourably with an independent estimation ofthis parameter based on field surveys of toads in the mostrecent part of the northern expansion area (i.e. around55 km per year between 2000 and 2005; Phillips et al.2007).

Towards more sophisticated models of spatial expansion

Our approach represents a significant step towards ageneric model–based method allowing one to makestatistical inferences on bioinvasions combining varioussources of information. Our reconstruction of the intro-duction and spatial expansion processes however inevi-tably includes approximations and simplifications of the

! 2010 Blackwell Publishing Ltd

898 A. ESTOUP ET AL .

actual process. First, in our model, the ancestral popula-tion that serves as a source for individuals of the spatiallyexpanding population is a Wright–Fisher type popula-tion (i.e. it is not spatially structured). It remains to betested whether this simplification, which considerablyreduces computation times, has a significant effect onparameter estimation. Second, we found that scenario 1(initial introduction of cane toad at Lagoon creek withfirst observation in 1979) was clearly more probable thanscenario 2 (initial introduction at Normanton with firstobservation in 1964). A potentially more realistic intro-duction history of cane toad would include two noninde-pendent introductions, the first one at Normanton andthe second one later at Lagoon creek, with the individu-als introduced at Lagoon creek originating from theNormanton area. Another alternative scenario wouldinclude several expansion phases of different speeds withNormanton as the initial introduction site. Further codeimplementations in SPLATCHE are needed to testwhether these alternative and more complex scenariosare more probable than our scenario 1.

That our spatial model assumes constant demo-graphic parameter values over time and space seems sen-sible because ecological conditions appear relativelyhomogeneous over time and space within the entireinvaded area. An intriguing possibility that is just begin-ning to be evaluated theoretically and empirically is thatlife history traits affecting demography and dispersal(e.g. migration rate, dispersal distance and reproductionecology) may evolve during range expansion, causingrates of spread to change over time (e.g. Lambrinos 2004;Phillips et al. 2008). For example, Phillips et al. (2006)recently found that the annual rate of progress of the toadinvasion front has increased about fivefold since the toadfirst arrived in Australia (1935), probably owing to adap-tive change in traits that increase individual dispersal(e.g. relative leg length). On the other hand, dates of firstobservation in the Southeast of the expansion range ofthe species indicate that the mean rate of spread declinedmore or less linearly between Byron Bay and Woodburn(reviewed in Estoup et al. 2001). The question of a pro-gressive increase in invasion rate (which basicallydepends on both the growth rate and dispersal distance)in the expansion area studied here within a time scale of<40 years and a geographical scale of approximately1000 km remains open. Including the possibility of spa-tial and temporal heterogeneity of demographic parame-ter values in the SPLATCHE demographic modulewould provide access to more refined models of spatialexpansion (see also Urban et al. 2008). This would allow amore accurate study of processes influencing geographi-cal range expansion. In particular, it would allow formaltesting for the occurrence of adaptive change in traits thatincrease or decrease the invasion rate and identifying

ecological factors that favour or disfavour the invasionprocess.

Acknowledgements

We thank Craig Moritz and George Roderick for constructivecomments on the manuscript and useful discussions. Thisresearch was financially supported by the French Agence Natio-nale de la Recherche grants No NT05-4-42230, ANR-06-BDIV-008-01 and ANR-09-BLAN-0145-01 to AE, JMC, FS and SJEB.This work was also supported by Swiss National Science Foun-dation grants No 3100A0-126074 and 3100A0-112651.

References

Alford RA, Cohen MP, Crossland MR, Hearnden MN, James D,Schwartzkopf L (1995). Population biology of Bufo marinus inNorth Australia. In: Wetland Research in the Wet-dry tropics ofAustralia (ed. Finlayson M), pp. 173–181. Office of the super-vising Scientist report 101, Camberra, Australia.

Barton NH, Wilson IJ (1995) Genealogies and geography.Philosophical Transactions of the Royal Society of London Series B –Biological Sciences, 349, 49–59.

Barton NH, Wilson IJ (1996) Genealogies and geography. In:New Uses for New Phylogenies (eds Harvey PH, Leigh BrownAJ, Maynard Smith J & Nee S). pp. 23–56. Oxford UniversityPress, Oxford.

Barton NH, Depaulis F, Etheridge AM (2002) Neutral evolutionin spatially continuous populations. Theoretical PopulationBiology, 61, 31–48.

Beaumont M (2008). Joint determination of topology, divergencetime, and immigration in population trees. in: Simulations,Genetics and Human Prehistory (eds Matsumura S, Forster P &Renfrew C), pp. 135–154. McDonald Institute for Archaeologi-cal Research, Cambridge.

Beaumont MA, ZhangW, Balding DJ (2002) Approximate Bayes-ian computation in population genetics. Genetics, 162, 2025–2035.

Beaumont MA, Cornuet J-M, Marin J-M, Robert CP (2010) Adap-tivity for ABC algorithms, the ABC-PMC scheme. Biometrika,96, 983–990.

Blum MGB, Francois O (2010) Non-linear regression models forApproximate Bayesian Computation. Statistics and Computing,20, 63–73.

Buffon G. (1777) Essai d’arithmetique morale. Histoire naturelle,generale et particuliere, Supplement 4, 46–123.

Cadotte M, McMahon SM, Fukami T (2006) Conceptual Ecologyand Invasion Biology, Reciprocal Approaches to Nature. Ser-ies, Invading Nature – Springer Series in Invasion Ecology, 1, 505.

Cornuet JM, Santos F, Beaumont MA et al. (2008) Inferringpopulation history with DIY ABC, a user-friendly approachto approximate Bayesian computation. Bioinformatics, 24,2713–2719.

Currat M, Ray N, Excoffier L (2004) SPLATCHE, a program tosimulate genetic diversity taking into account environmentalheterogeneity.Molecular Ecology Notes, 4, 139–142.

Dib C, Faure S, Fizames C et al. (1996) A comprehensive map ofthe human genome based on 5,264 microsatellites. Nature, 380,152–154.

! 2010 Blackwell Publishing Ltd

RECONSTRUCTING BIOINVASION DYNAMICS 899

Easteal S (1981) The history of introductions of Bufo marinus(Amphibia,Anura); a natural experiment in evolution. Biologi-cal Journal of the Linnean Society, 16, 93–113.

Easteal S, Floyd RB (1986) The ecological genetics of introducedpopulations of the giant toad, Bufo marinus (Amphibia,Anura), dispersal and neighborhood size. Biological Journal ofthe Linnean Society of London, 27, 17–45.

Easteal S, Van Beurden EK, Floyd R, Sabath M (1985) Contin-uing geographical spread of Bufo marinus in Australia, rangeexpansion between 1974 and 1980. Journal of Herpetology, 19,185–188.

Estoup A, Wilson IJ, Sullivan C, Cornuet JM, Moritz C (2001)Inferring population history from microsatellite and enzymedata in serially introduced cane toads, Bufo marinus. Genetics,159, 1671–1687.

Estoup A, Jarne P, Cornuet JM (2002) Homoplasy and mutationmodel at microsatellite loci and their consequence for popula-tion genetics analysis.Molecular Ecolology, 11, 1591–1604.

Estoup A, Beaumont MA, Sennedot F, Moritz C, Cornuet JM(2004) Genetic analysis of complex demographic scenarios,spatially expanding populations of the cane toad, Bufo mari-nus. Evolution, 58, 2021–2036.

Excoffier L, Estoup A, Cornuet JM (2005) Bayesian analysis of anadmixture model with mutations and arbitrarily linked mark-ers. Genetics, 169, 1727–1738.

Fagundes NJ, Ray N, Beaumont M et al. (2007) Statistical evalua-tion of alternative models of human evolution. Proceedings ofthe National Academy of Sciences of the U S A, 104, 17614–17619.

Francois O, Blum MGB, Jakobsson M, Rosenberg NA (2008)Demographic history of European populations of Arabidopsisthaliana. Plos Genetics, 4, 5.

Freeland WJ, Martin KC (1985) The rate of range expansion byBufo marinus in North Australia, 1980-84. Australian WildlifeResearch, 15, 555–559.

Gilbert M, Gregoire J-C, Freise JF, Heitland W (2004) Long-distance dispersal and human population density allowthe prediction of invasive patterns in the horse chestnutleafminer Cameraria ohridella. Journal of Animal Ecology, 73,459–468.

Guillemaud T, Beaumont MA, Ciosi M, Cornuet JM, Estoup A(2010) Inferring introduction routes of invasive species usingapproximate Bayesian computation on microsatellite data.Heredity, 104, 88–99.

Hamilton G, Currat M, Ray N, Heckel G, Beaumont MA, Excof-fier L (2005a) Bayesian estimation of recent migration ratesafter spatial expansion. Genetics, 170, 409–417.

Hamilton G, Stoneking M, Excoffier L (2005b) Molecular anal-ysis reveals tighter social regulation of immigration in patri-local populations than in matrilocal populations. Proceedingsof the National Academy of Sciences of the U S A, 102, 7476–7480.

Hasting A, Cuddington K, Davies KF et al. (2005) The spatialspread of invasions, new developments in theory and evi-dence. Ecology Letters, 8, 91–101.

Hickerson MJ, Meyer C (2008) Testing comparative phylogeo-graphic models of marine vicariance and dispersal using ahierarchical Bayesian approach. BMC Evolutionary Biology, 8,322 doi,10.1186/1471-2148-8-322.

Kimura M, Weiss WH (1964) The stepping stone model ofgenetic structure and the decrease of genetic correlation withdistance. Genetics, 49, 561–576.

Kot M, Lewis MA, van den Driessche P (1996) Dispersal dataand the spread of invading organisms. Ecology, 77, 2027–2042.

Lambrinos JG (2004) How interaction between ecology andevolution influence contemporary invasion dynamics. Ecology,85, 2061–2070.

Laval G, Excoffier L (2004) SIMCOAL 2.0, a program to simulategenomic diversity over large recombining regions in a subdi-vided population with a complex history. Bioinformatics, 20,2485–2487.

Leuenberger C, Wegmann D (2010) Bayesian computation andmodel selection without likelihoods. Genetics, 184, 243–252.

Lodge DM (1993) Biological invasions, lessons for ecology.Trends in Ecology and Evolution, 8, 133–137.

Lubina JA, Levin SA (1988) The spread of a reinvading species,range expansion in the California sea otter. American Natural-ist, 131, 526–543.

Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chainMonte Carlo without likelihoods. Proceedings of the NationalAcademy of Sciences of the U S A, 100, 15324–15328.

Nei M (1987) Molecular Evolutionary Genetics. Columbia Univer-sity Press, New York. 512 p.

Okubo A, Levin SA (2001) Diffusion and Ecological Problems, Mod-ern Perspectives. Springer-Verlag, New York, 467 p.

Phillips BL, Brown GP, Webb JK, Shine R (2006) Invasion andthe evolution of speed in toad.Nature, 439, 803.

Phillips BL, Brown GP, Greenlees M, Webb JK, Shine R (2007)Rapid expansion of the cane toad (Bufo marinus) invasion frontin tropical Australia. Austral Ecology, 32, 169–176.

Phillips BL, Brown GP, Travis JM, Shine R (2008) Reid’s paradoxrevisited, the evolution of dispersal kernels during rangeexpansion. American Naturalist, 172(Suppl. 1), S34–S48.

Pimentel D, McNair S, Janecka J et al. (2001) Economic and envi-ronmental threats of alien plant, animal, and microbe inva-sions. Agriculture Ecosystems and the Environment, 84, 1–20.

Pinhasi R, Fort J, Ammerman AJ (2005) Tracing the origin andspread of agriculture in Europe. Plos Biology, 3, 12.

Pollock DD, Bergman A, Feldman MW, Goldstein DB (1998)Microsatellite behavior with range constraints, parameterestimation and improved distances for use in phylogeneticreconstruction. Theoretical Population Biology, 53, 256–271.

Pritchard JK, Seielstad MT, Prez-Lezaum A, Feldman MW (1999)Population growth of human Y chromosomes, a study of Ychromosome microsatellites. Molecular Biology and Evolution,16, 1791–1798.

Ray N, Currat M, Berthier P, Excoffier L (2005) Recoveringthe geographic origin of early modern humans by realisticand spatially explicit simulations. Genome Research, 15,1161–1167.

Rice WR, Sax DF (2005) Testing fundamental evolutionary ques-tions at large spatial and demographic scales. in Species Inva-sions, Insights into Ecology, Evolution, and Biogeography (eds SaxDF, Stachowicz JJ & Gaines SD), pp. 291–313. Sinauer Associ-ates, Inc. Publishers, Sunderland, Massachusetts, USA.

Rousset F (2004) Genetic Structure and Selection in SubdividedPopulations. Princeton University Press, Princeton, New Jersey.288 P.

Ruiz GM, Rawling TK, Dobbs FC et al. (2000) Global spread ofmicroorganisms by ships. Nature, 408, 49–50.

Sax DF, Stachowicz JJ, Gaines SD (2005) Species invasions, insightsinto ecology, evolution, and biogeography. Sinauer Associates, Inc.Publishers, Sunderland, Massachusetts, USA. 495 p.

! 2010 Blackwell Publishing Ltd

900 A. ESTOUP ET AL .

Schwarzkopf L, Alford RA (2002) Nomadic movement in tropi-cal toads.Oikos, 96, 492–506.

Scribner KT, Arntzen JW, Burke T (1997) Effective number ofbreeding adults in Bufo bufo estimated from age-specific varia-tion at minisatellite loci.Molecular Ecology, 6, 701–712.

Shriner D, Liu Y, Nickle DC, Mullins JI (2006) Evolution of intra-host HIV-1 genetic diversity during chronic infection. Evolu-tion, 60, 1165–1176.

Slatkin M, Maddison WP (1990) Detecting Isolation by distanceusing phylogenies of genes. Genetics, 126, 249–260.

Solomon H (1978) Geometric Probability. Society for Industrialand Applied Mathematics, Philadelphia.

Tanaka MM, Francis RF, Luciani F, Sisson SA (2006) Usingapproximate Bayesian computation to estimate tuberculosistransmission parameters from genotype data. Genetics, 173,1511–1520.

Tavare S, Balding DJ, Griffiths RC, Donnely P (1997) Inferringcoalescence times from DNA sequence data. Genetics, 145,505–518.

Toni T, Welch D, Strelkova N, Ipsen A, Stumpf MPH (2009)Approximate Bayesian computation scheme for parameterinference and model selection in dynamical systems. Journal ofthe Royal Society Interface, 6, 187–202.

Urban MC, Phillips BL, Skelly DK, Shine R (2008) A toad moretraveled, the heterogeneous invasion dynamics of cane toadsin Australia. American Naturalist, 171, 134–148.

Verdu P, Austerlitz F, Estoup A et al. (2009) Origins and GeneticDiversity of Pygmy Hunter-Gatherers from Western CentralAfrica. Current Biology, 19, 312–318.

Weber E (1998) The dynamics of plant invasions, a case study ofthree exotic goldenrod species (Solidago L.) in Europe. Journalof Biogeography, 25, 147–154.

Wegmann D, Leuenberger C, Excoffier L (2009) EfficientApproximate Bayesian Computation coupled with MarkovChain Monte Carlo without likelihood. Genetics, 182, 1207–1218.

Weir BS, Cockerham CC (1984) Estimating F-statistics for theanalysis of population structure. Evolution, 38, 1358–1370.

Supporting Information

Additional supporting information may be found in theonline version of this article:

Table S1 Robustness of scenario comparison using thelandscape-ABC method for the cane toad invasion ofnorthern Australia.

Table S2 Robustness of estimation of parameters usingthe landscape-ABC method for the cane toad invasion ofnorthern Australia.

Table S3 Performance of the landscape-ABC method forcomparing scenarios using reference files with 2 · 105

and 2 · 106 data sets.

Table S4 Performance of the landscape-ABC method forinferring parameters based on reference files with 2 · 105

data sets.

Table S5 Observed values of statistics summarizinggenetic data for the nine sites sampled within the canetoad expansion area in northern Australia.

Please note: Wiley-Blackwell are not responsible for thecontent or functionality of any supporting informationsupplied by the authors. Any queries (other than missingmaterial) should be directed to the corresponding authorfor the article.

Appendix

General algorithm that lists all the steps of the estimationprocess including the steps specific to Baird and Santos’sspatial model (lattice placement and scaling).

The sampling of {O, Q} from flat priors and the map-ping of (x, y) coordinates of sampled sites are bothprocessed by the program TilerDurden, and the mt

values are provided by the m Vector function. The com-putations processed in the later program and functionare detailed in Baird & Santos’s paper, this special issue.See Table 1 for parameter definitions and textfor details. Flowchart adapted from the Figure S1 inCornuet et al. (2008).

Draw parameter values from prior distributions:

- Demographic parameters : , , r, , NF, DS, X, NB, NA

- Parameters of the lattice to be placed on the 1400 x 250 km studied area : origin O, orientation (angle ), tile spacing X

- Microsatellite marker parameters : ,

Compute:

- the probability of moving between tiles at time t (mt vector)

- (x,y) coordinates of sampled sites based on the lattice as defined previously

P

SIMULATION STEP

Simulate genetic data according to scenario and mutation model

Compute summary statistics

(genetic statistics and dates of first observation)

REJECTION STEP

Compute distances between observed and simulated summary statistics

Retain simulated data sets closest to observed data

ESTIMATION STEP

Estimate posterior probabilities of scenarios

Estimate posterior distributions of parameters

Record parameter and summary statistic values in

a reference table file

For each

scenario i

Repeat ni times

! 2010 Blackwell Publishing Ltd

RECONSTRUCTING BIOINVASION DYNAMICS 901


Recommended