Tracking the genetic legacy of past human populationsthrough the grid
NICOLAS RAY UNIVERSITY OF GENEVA & UNEP/GRID-EUROPE
Swiss Grid Day, Bern, November 26th 2009
Human migrations
Adapted from Cavalli-Sforza & Feldman, 2003
[12,000]
[55,000]
Homo sapiens sapiens
1. Better understand human evolution
• Origin of modern human (when, where, how many?)
• Relationship with other members of the Homo genus
2. Distinguish between the effect of demography and those
of selection (biomedical applications)
Why aiming at a good demographic model
Gene-specific factorsmutationsrecombinationselection
A complex past demographyfluctuation in effective pop. sizesubstructuremigrations
Observed patterns of genetic diversity in contemporary populations
A complex demography
Adapted from Cavalli-Sforza & Feldman, 2003
[10,000]
[55,000]
demographic and spatial expansions
population bottlenecks
fast migration events
population isolation
secondary contacts
SPLATCHESPatiaL And Temporal Coalescences in Heterogeneous Environment
(http://cmpg.unibe.ch/software/splatche)
From environment to demography
Spatial resolution: 100 km
low
high
Carrying capacity
low
high
Friction
From environment to demography
Demographic simulations
stepping-stone model (cellular automata)
Cell or deme
Pop
. siz
e
time
Demography and spatial expansion
Population density
Genetic simulations
Simulated genealogy
MutationModèle de mutation
ACCT
AGTA
CAAT
CGGT
AATG
CCAT
TGGT
TCCTTGTA…ATTGGT
ACCGAGTA…GTTGGT
Summary statistics– Within population:
• S, p– Between populations
• Pairwise FST
• Global FST
– Globally• S, p
Draw paramet
er values from priors
Simulate one
genealogy
Generate
genetic data
Compute
summary
statistics
1-10 mio.
Computer clusters
UBELIX (>500 nodes)Zooblythii (~40 nodes)
APPROXIMATE BAYESIAN COMPUTATIONS (ABC) COMPUTATIONAL ISSUES
Computational issues
A fully spatially-explicit model using 500 loci in 800 individuals:
10 CPU-years
Adding long-distance dispersal:
20 CPU-years
SPLATCHE on the grid
early 2005: joined the Biomed VO of the EGEE project
mid 2005: tested on GILDA test bed, and deployed on the Grid
since mid 2006: production mode and optimization
Use of SPLATCHE on the grid
N simulations
Independent simulations: - the more CPUs, the better- job failures are not that bad
GRID
Posterior distribution of demographic/genetic parameters of interest
Statistical tools
Optimizations
5 mio. simulations
GRID
Reduction of the number of simulations (Daniel Wegmann)By MCMC. Promising results (~50 times less sims)
Submission timemulti-threaded application using up to 30 RBs (used for the WISDOM project)
Fetching time of job outputsin-house multi-threaded solution for checking status and getting outputs
Geographic origin of human dispersal
Ray et al. (2005) Genome Research
Interactions among populations
Interaction between modern humans and Neanderthals in Europe
Currat & Excoffier (2004), PLoS Biol.
Plausible introduction site 1LAGOON CREEK (first sight: 1979)
Initial introduction site in AustraliaGORDONVALE (1935)
KDM
NW
B
T
RE
120 0 120 240 360Kilometers
19991982
19881992
1995
1996
19971998
Plausible introduction site 2NORMANTON (first sight: 1964)
Cane toad invasion in Australia
Estoup, A., Baird, S. J. E., Ray, N., Currat, M., Cornuet, J.-M., Santos, F., Beaumont, M. A. and L. Excoffier. Combining genetic, historical and geographic data to reconstruct the dynamics of the bioinvasion of cane toad Bufo marinus. Submitted
Take-home message
A good human demographic model is important
Realistic spatially-explicit approaches are essential
The grid is key for sufficient exploration of parameter spaceUser support and connections outside one’s discipline is crucial
THANK YOU!