Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 225 times |
Download: | 3 times |
Human Migrations
Anjalee Sujanani
CS 374 – Algorithms in Biology Tuesday, October 24, 2006
2
Papers
Review: The application of molecular genetic approaches to the study of human evolution L. Luca Cavalli-Sforza & Marcus W. Feldman Nature Genetics 33, 266 -
275 (2003)
Recovering the geographic origin of early modern humans by realistic and spatially explicit simulations Nicolas Ray, Mathias Currat, Pierre Berthier, and Laurent Excoffier,
Genome Res., Aug 2005; 15: 1161 - 1167.
Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa Sohini Ramachandran, Omkar Deshpande, Charles C. Roseman, Noah
A. Rosenberg, Marcus W. Feldman, and L. Luca Cavalli-Sforza, PNAS, November 1, 2005, Vol. 102, No. 44
4
Timeline: Study of Genetic Variation
1919Existence of human genetic variation first demonstrated in a study of ABO gene
1966Studies showed that almost every protein has genetic variants
These variants became useful markers for population studies
Marker:
A gene or other segment of DNA whose position on a chromosome is known
1980A new method to construct a genetic linkage map of the human genome by using radioisotopes generated more new markers
1986Polymerase Chain Reaction (PCR) developed
Allows a small amount of the DNA molecule to be amplified exponentially
Expanded the number of studies that could work directly with DNA
1990sDevelopment of automated DNA sequencing
5
Shaping of Genetic Variation
Human evolution Genome structure Population history
Human migration Dating origin of our species Tracking migrations of our species using DNA
Relationship of separated human populations
6http://www.ncbi.nlm.nih.gov/Class/NAWBIS/Modules/Variation/var17.html
Gene Variation
All genetic variation is caused by mutations Most common: Single Nucleotide
Polymorphisms
A single base change, occurring in a population at a frequency of >1% is termed a single nucleotide polymorphism (SNP).
When a single base change occurs at <1% it is considered to be a mutation
SNPs: DNA sequence variation occurring when a single nucleotide - A, T, C, or G - in the genome (or other shared sequence) differs between members of a species.
e.g. AAGCCTA
AAGCTTA
Here, we say that there are two alleles : C and T
Allelic frequency: Defines variation at a single nucleotide position
Polymorphism: Variation in DNA sequences between individuals
7
Allele Frequency Change (1)
Natural Selection tendency of beneficial alleles to become more
common over time and detrimental ones less common
Random Genetic Drift fundamental tendency of any allele to vary randomly
in frequency over time due to statistical variation alone
=> Both can lead to elimination or fixation of an allele
8
Allele Frequency Change (2)
Migration Avg. 1 immigrant per generation in a population
Sufficient to keep drift partially in check Avoids complete fixation of alleles
Whole populations migrate and settle elsewhere If initially small and then expands:
founder frequency vs. original population frequency: Δ
founder frequency vs. new location population frequency: Δ Creates more chances for drift Causes divergence and intergroup variation in allele
frequencies
=> Group migration has opposite effect to individual migration
9
Population History (1)
Large populations that are geographically and genetically distant: History can be inferred by Population Trees
Assumptions:
Fissions occur randomly in time
Constant rate of neutral evolution in each population between fissions
Neutral Evolution:
Where many changes that occur during evolution are selectively neutral
The frequency of a selectively neutral gene is as likely to decrease as to increase by genetic drift
On average the frequencies of neutral alleles remain unchanged from one generation to the next.
10
Summary Tree of World Population
Polymorphisms of 120 protein genes 1,915 populations
Nature genetics supplement, Volume 33, March 2003, pg 267
11
High Resolution Population History (1) mtDNA – maternally inherited
12
High Resolution Population History (2) Y chromosome – paternally inherited
13
Population History (2)
Geographically close populations Distances often highly
correlated
Genetic distance of population pairs measured by FST
Function of geographic distance between members of population pairs
Asymptote for genetic distance 1,000-2,600 miles avg.
World and Asia higher since are not at equilibrium
Nature genetics supplement, Volume 33, March 2003, pg 268
14
Population History (3)
Statistical Methods: Principal Components
Good for when migration is frequent between neighbors
Gives similar results to trees (assumptions hold)
Cluster Detection Good for analyzing large population genetic datasets Produces same primary continental clusters as earlier
methods
15
Tracking migrations using DNA
‘Standard model of modern human evolution’ Expansion from East Africa
Nature genetics supplement, Volume 33, March 2003, pg 270
Expansion from East Africa into rest of Africa by ~ 1,000 individuals
Second expansion – into Asia
16
Serial Founder Effect Originating in Africa
Found a linear relationship between genetic and geographic distances in a world-wide sample of human populations
Ramachandran et al.,PNAS 102(44)
= Within region comparison
= Africa vs. Eurasia
= America vs. Oceania
Waypoints:
Fixed locations used in estimation of between-continental distances.
Makes estimates more reflective of human migration patterns
Based on belief humans did not cross large bodies of water while migrating (until recently)
17
Alternate Hypothesis
‘Multiregional model’ Human populations originated in each
continent and evolved in parallel
Study: Recovering the geographic origin of early modern humans by realistic and spatially explicit simulations Quantifies likelihood of Unique Origin (UO) model
relative to Multiregional Evolution (ME) model.
Nature genetics supplement, Volume 33, March 2003, pg 270
18
Method (1)
Simulations designed to match a previous study that used 377 Short Tandem Repeats (STRs)
Rosenberg, N.A., Pritchard, J.K., Weber, J.L., Cann, H.M., Kidd, K.K., Zhivotovsky, L.A., and Feldman, M.W. 2002. Genetic structure of human populations. Science 298: 2381–2385
STR:
A common class of polymorphism, consisting of a pattern of two or more nucleotides repeating in tandem.
Repeat unit: 2-10 base pairs
e.g. GATAGATAGATAGATAGATAGATA
Simulations designed to match a previous study that used 377 Short Tandem Repeats (STRs) 1052 individuals, 52 populations
Individuals Separates different populations
Populations – e.g. Han
Regions - e.g. East Asia
19
Method (2)
Considered populations with sample size > 20 individuals Called the Rosenberg22 data set
Nicolas Ray, Mathias Currat, Pierre Berthier, and Laurent Excoffier, Recovering the geographic origin of early modern humans by realistic and spatially explicit simulationsGenome Res., Aug 2005; 15: 1161 - 1167.
Carrying Capacity:
Population level that can be supported for an organism, given the quantity of food, habitat, water and other life infrastructure present.
Friction:
The relative difficulty of moving through a deme.Deme:
A sub-population.
Land surface of the Old World was divided into a grid of 9,226 demes – each 100 x 100 km2
20
Simulations
Used software called SPLATCHE
Step 1: Forward time simulation of demographic and spatial expansion using environmental information For each generation, record:
# individual genes/deme # immigrant genes from 4 nearest-neighboring demes
Step 2: Backward time simulation of genealogy and gene diversity sampled at given locations using information generated in Step 1
21
Step 1
Origin of each expansion: single deme with 50 diploid individuals (100 nuclear genes) 25 origins of expansion
Onset of expansion: 4000 generations (20,000 years ago)
Each generation, occupied demes subject to: Growth phase
Constant growth rate r= 0.3 Carrying capacity K depends on environment of deme
Emigration phase
22
Step 1 – Emigration Phase
Distributed 0.05Nt emigrants to 4 nearest neighboring demes Nt = size of deme at time t
Exact number of emigrants sent to each deme (Ei) controlled through friction values (Fi), for each deme Friction = relative difficulty of moving through a deme Fi values kept within range of 0.1 to 1
Ei : computed from multinomial distribution:
23
Step 2
For each of 25 geographic origins, performed 10,000 simulations of genetic diversity Simulations generated molecular diversity data at a given number
of STR loci for each of the 22 population samples
Tested 10 evolutionary scenarios ME Model: 9 combinations of population size and migration rates
For each data set: Index of population differentiation (RST) was computed between all
pairs of populations
Provided a measure of genetic divergence between populations
Scenarios 26 – 28:
Same population size, different migration rates
Scenarios 29 – 31:
Africa population size > Asia and Europe population sizes
Migration rate adjusted so # emigrants same
Scenarios 32 – 34:
Africa population size > Asia and Europe population sizes
Same migration rates
Scenarios 26 – 28:
Same population size, different migration rates
Scenarios 29 – 31:
Africa population size > Asia and Europe population sizes
Migration rate adjusted so # emigrants same
Scenarios 32 – 34:
Africa population size > Asia and Europe population sizes
Same migration rates
Scenarios 26 – 28:
Same population size, different migration rates
Scenarios 29 – 31:
Africa population size > Asia and Europe population sizes
Migration rate adjusted so # emigrants stayed same
Scenarios 32 – 34:
Africa population size > Asia and Europe population sizes
Same migration rates
24
Timeline
1. 30,000 generations ago:
Demographic expansion following first speciation event
2. For 26,000 generations:
Large, subdivided population exists
3. 4,000 generations ago:
Bottleneck of 10 generations followed by range expansion
1. 30,000 generations ago:
Small population went through speciation & instantaneously colonized 3 continents
2. For 26,000 generations:
Continents had large populations & exchanged occasional migrants
3. 4,000 generations ago:
Three range expansions from three different origins (shown in C)
1. 30,000 generations ago:
Small population went through speciation & instantaneously colonized 3 continents
2. For 26,000 generations:
Continents had large populations & exchanged occasional migrants
3. 4,000 generations ago:
Three range expansions from three different origins (shown in C)
1. 30,000 generations ago:
Small population went through speciation & instantaneously colonized 3 continents
2. For 26,000 generations:
Continents had large populations & exchanged occasional migrants
3. 4,000 generations ago:
Three range expansions from three different origins (shown in C)
25
Results - Analysis (1)
What they wanted to do:1. Assume a given genetic data set is the product of a
specific evolutionary scenario2. Estimate likelihood of all scenarios that can generate
that data.3. Choose scenario maximizing the likelihood
What they did:1. Select a restricted number of ME and UO scenarios2. For each, replace likelihood by measure of
goodness-of-fit of genetic data to the model3. Choose scenario maximizing goodness of fit
26
Results - Analysis (2)
Goodness-of-fit between observed data and model determined using simulations:
Computed a correlation co-efficient using index of population differentiation RST values calculated earlier
Repeated this for many simulations/model to get a probability distribution of the co-efficient under each model.
90% quantile value of distribution used as goodness-of-fit index
Value chosen as a result of previous extensive simulation experience
Called index the R90 statistic
27
Results - Validation
Simulations were used to evaluate how correctly geographic origin of an expansion could be recovered from STR data.
10,000 simulations performed per scenario Divided into 2 sets of 5,000 simulations First 5,000 runs under each scenario used as pseudo-observed
data Compared to second 5,000 generated under all scenarios
Geographic origin of expansion assigned to scenario with the largest R90 statistic
Tallied up how many times they correctly assigned pseudo-observed simulations to true origin for each model
28
Results - Distinguishing UO vs. ME models (1)
Used similar validation method to differentiate data sets generated under the two models
Dataset is correctly assigned if chosen scenario belongs to same evolutionary model that generated it - i.e.
Is data set generated under a UO scenario is assigned to any geographic origin under the UO model ?
Is data set generated under any ME scenario is assigned to any ME scenario ?
Correct regardless of location of origin or ME scenario
29
Results - Distinguishing UO vs. ME models (2)
Results showed evolutionary models well discriminated with single locus: UO correct assignment frequency >> ME recovery rate
30
Results - Unknown Origins
How is probability of recovering source of expansion affected if we assume an incorrect geographical origin ?
Performed 10,000 simulations on 14 alternative ‘true’ origins
Genetic data sets from 25 simulated potential origins compared with the ‘true’ origins
Measured probability of recovering correct geographic region of origin- 4 regions
Result: Correct assignment / region was much higher when origin was known:
0.771 vs. 0.882 (20 loci)0.852 vs. 0.999 (377 loci)
31
Results – Human Nuclear STRs
R90 goodness-of-fit statistics between original Rosenberg22 and scenario generated data showed: UO model fit better overall
BUT pointed to North African origin
Suspected this was caused by 377 STRs having European ascertainment bias. Ran tests to see if results would be affected by
an ascertainment bias
Tests showed probability of inferring correct origin drops when using biased data
Re-computed R90 statistics after correcting bias, and found East African origin was now more favored.
Agrees with other recent studies
32
Summary
Attempted to find geographic origin of modern humans from patterns in current world-wide genetic diversity
Explicitly accounted for physical constraints to dispersion
Results showed that the origin can be well recovered if: We have a large number of markers Markers do not suffer ascertainment bias Simulated origin is close to true origin
Simulations showed UO and ME models could be clearly distinguished
UO model favored R90 statistic four times higher for best UO scenario (0.1) vs.
best ME scenario (0.023)
Thank you
CS 374 – Algorithms in Biology Tuesday, October 24, 2006