Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

Human Migrations

Anjalee Sujanani

CS 374 – Algorithms in Biology Tuesday, October 24, 2006

2

Papers

Review: The application of molecular genetic approaches to the study of human evolution L. Luca Cavalli-Sforza & Marcus W. Feldman Nature Genetics 33, 266 -

275 (2003)

Recovering the geographic origin of early modern humans by realistic and spatially explicit simulations Nicolas Ray, Mathias Currat, Pierre Berthier, and Laurent Excoffier,

Genome Res., Aug 2005; 15: 1161 - 1167.

Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa Sohini Ramachandran, Omkar Deshpande, Charles C. Roseman, Noah

A. Rosenberg, Marcus W. Feldman, and L. Luca Cavalli-Sforza, PNAS, November 1, 2005, Vol. 102, No. 44

4

Timeline: Study of Genetic Variation

1919Existence of human genetic variation first demonstrated in a study of ABO gene

1966Studies showed that almost every protein has genetic variants

These variants became useful markers for population studies

Marker:

A gene or other segment of DNA whose position on a chromosome is known

1980A new method to construct a genetic linkage map of the human genome by using radioisotopes generated more new markers

1986Polymerase Chain Reaction (PCR) developed

Allows a small amount of the DNA molecule to be amplified exponentially

Expanded the number of studies that could work directly with DNA

1990sDevelopment of automated DNA sequencing

5

Shaping of Genetic Variation

Human evolution Genome structure Population history

Human migration Dating origin of our species Tracking migrations of our species using DNA

Relationship of separated human populations

6http://www.ncbi.nlm.nih.gov/Class/NAWBIS/Modules/Variation/var17.html

Gene Variation

All genetic variation is caused by mutations Most common: Single Nucleotide

Polymorphisms

A single base change, occurring in a population at a frequency of >1% is termed a single nucleotide polymorphism (SNP).

When a single base change occurs at <1% it is considered to be a mutation

SNPs: DNA sequence variation occurring when a single nucleotide - A, T, C, or G - in the genome (or other shared sequence) differs between members of a species.

e.g. AAGCCTA

AAGCTTA

Here, we say that there are two alleles : C and T

Allelic frequency: Defines variation at a single nucleotide position

Polymorphism: Variation in DNA sequences between individuals

7

Allele Frequency Change (1)

Natural Selection tendency of beneficial alleles to become more

common over time and detrimental ones less common

Random Genetic Drift fundamental tendency of any allele to vary randomly

in frequency over time due to statistical variation alone

=> Both can lead to elimination or fixation of an allele

8

Allele Frequency Change (2)

Migration Avg. 1 immigrant per generation in a population

Sufficient to keep drift partially in check Avoids complete fixation of alleles

Whole populations migrate and settle elsewhere If initially small and then expands:

founder frequency vs. original population frequency: Δ

founder frequency vs. new location population frequency: Δ Creates more chances for drift Causes divergence and intergroup variation in allele

frequencies

=> Group migration has opposite effect to individual migration

9

Population History (1)

Large populations that are geographically and genetically distant: History can be inferred by Population Trees

Assumptions:

Fissions occur randomly in time

Constant rate of neutral evolution in each population between fissions

Neutral Evolution:

Where many changes that occur during evolution are selectively neutral

The frequency of a selectively neutral gene is as likely to decrease as to increase by genetic drift

On average the frequencies of neutral alleles remain unchanged from one generation to the next.

10

Summary Tree of World Population

Polymorphisms of 120 protein genes 1,915 populations

Nature genetics supplement, Volume 33, March 2003, pg 267

11

High Resolution Population History (1) mtDNA – maternally inherited

12

High Resolution Population History (2) Y chromosome – paternally inherited

13


Geographically close populations Distances often highly

correlated

Genetic distance of population pairs measured by FST

Function of geographic distance between members of population pairs

Asymptote for genetic distance 1,000-2,600 miles avg.

World and Asia higher since are not at equilibrium


14


Statistical Methods: Principal Components

Good for when migration is frequent between neighbors

Gives similar results to trees (assumptions hold)

Cluster Detection Good for analyzing large population genetic datasets Produces same primary continental clusters as earlier

methods

15

Tracking migrations using DNA

‘Standard model of modern human evolution’ Expansion from East Africa


Expansion from East Africa into rest of Africa by ~ 1,000 individuals

Second expansion – into Asia

16

Serial Founder Effect Originating in Africa

Found a linear relationship between genetic and geographic distances in a world-wide sample of human populations

Ramachandran et al.,PNAS 102(44)

= Within region comparison

= Africa vs. Eurasia

= America vs. Oceania

Waypoints:

Fixed locations used in estimation of between-continental distances.

Makes estimates more reflective of human migration patterns

Based on belief humans did not cross large bodies of water while migrating (until recently)

17

Alternate Hypothesis

‘Multiregional model’ Human populations originated in each

continent and evolved in parallel

Study: Recovering the geographic origin of early modern humans by realistic and spatially explicit simulations Quantifies likelihood of Unique Origin (UO) model

relative to Multiregional Evolution (ME) model.


18

Method (1)

Simulations designed to match a previous study that used 377 Short Tandem Repeats (STRs)

Rosenberg, N.A., Pritchard, J.K., Weber, J.L., Cann, H.M., Kidd, K.K., Zhivotovsky, L.A., and Feldman, M.W. 2002. Genetic structure of human populations. Science 298: 2381–2385

STR:

A common class of polymorphism, consisting of a pattern of two or more nucleotides repeating in tandem.

Repeat unit: 2-10 base pairs

e.g. GATAGATAGATAGATAGATAGATA

Simulations designed to match a previous study that used 377 Short Tandem Repeats (STRs) 1052 individuals, 52 populations

Individuals Separates different populations

Populations – e.g. Han

Regions - e.g. East Asia

19

Method (2)

Considered populations with sample size > 20 individuals Called the Rosenberg22 data set

Nicolas Ray, Mathias Currat, Pierre Berthier, and Laurent Excoffier, Recovering the geographic origin of early modern humans by realistic and spatially explicit simulationsGenome Res., Aug 2005; 15: 1161 - 1167.

Carrying Capacity:

Population level that can be supported for an organism, given the quantity of food, habitat, water and other life infrastructure present.

Friction:

The relative difficulty of moving through a deme.Deme:

A sub-population.

Land surface of the Old World was divided into a grid of 9,226 demes – each 100 x 100 km2

20

Simulations

Used software called SPLATCHE

Step 1: Forward time simulation of demographic and spatial expansion using environmental information For each generation, record:

# individual genes/deme # immigrant genes from 4 nearest-neighboring demes

Step 2: Backward time simulation of genealogy and gene diversity sampled at given locations using information generated in Step 1

21

Step 1

Origin of each expansion: single deme with 50 diploid individuals (100 nuclear genes) 25 origins of expansion

Onset of expansion: 4000 generations (20,000 years ago)

Each generation, occupied demes subject to: Growth phase

Constant growth rate r= 0.3 Carrying capacity K depends on environment of deme

Emigration phase

22

Step 1 – Emigration Phase

Distributed 0.05Nt emigrants to 4 nearest neighboring demes Nt = size of deme at time t

Exact number of emigrants sent to each deme (Ei) controlled through friction values (Fi), for each deme Friction = relative difficulty of moving through a deme Fi values kept within range of 0.1 to 1

Ei : computed from multinomial distribution:

23

Step 2

For each of 25 geographic origins, performed 10,000 simulations of genetic diversity Simulations generated molecular diversity data at a given number

of STR loci for each of the 22 population samples

Tested 10 evolutionary scenarios ME Model: 9 combinations of population size and migration rates

For each data set: Index of population differentiation (RST) was computed between all

pairs of populations

Provided a measure of genetic divergence between populations

Scenarios 26 – 28:

Same population size, different migration rates


Africa population size > Asia and Europe population sizes

Migration rate adjusted so # emigrants same



Same migration rates





Migration rate adjusted so # emigrants same








Migration rate adjusted so # emigrants stayed same




24

Timeline

1. 30,000 generations ago:

Demographic expansion following first speciation event

2. For 26,000 generations:

Large, subdivided population exists


Bottleneck of 10 generations followed by range expansion


Small population went through speciation & instantaneously colonized 3 continents


Continents had large populations & exchanged occasional migrants


Three range expansions from three different origins (shown in C)













25

Results - Analysis (1)

What they wanted to do:1. Assume a given genetic data set is the product of a

specific evolutionary scenario2. Estimate likelihood of all scenarios that can generate

that data.3. Choose scenario maximizing the likelihood

What they did:1. Select a restricted number of ME and UO scenarios2. For each, replace likelihood by measure of

goodness-of-fit of genetic data to the model3. Choose scenario maximizing goodness of fit

26

Results - Analysis (2)

Goodness-of-fit between observed data and model determined using simulations:

Computed a correlation co-efficient using index of population differentiation RST values calculated earlier

Repeated this for many simulations/model to get a probability distribution of the co-efficient under each model.

90% quantile value of distribution used as goodness-of-fit index

Value chosen as a result of previous extensive simulation experience

Called index the R90 statistic

27

Results - Validation

Simulations were used to evaluate how correctly geographic origin of an expansion could be recovered from STR data.

10,000 simulations performed per scenario Divided into 2 sets of 5,000 simulations First 5,000 runs under each scenario used as pseudo-observed

data Compared to second 5,000 generated under all scenarios

Geographic origin of expansion assigned to scenario with the largest R90 statistic

Tallied up how many times they correctly assigned pseudo-observed simulations to true origin for each model

28

Results - Distinguishing UO vs. ME models (1)

Used similar validation method to differentiate data sets generated under the two models

Dataset is correctly assigned if chosen scenario belongs to same evolutionary model that generated it - i.e.

Is data set generated under a UO scenario is assigned to any geographic origin under the UO model ?

Is data set generated under any ME scenario is assigned to any ME scenario ?

Correct regardless of location of origin or ME scenario

29

Results - Distinguishing UO vs. ME models (2)

Results showed evolutionary models well discriminated with single locus: UO correct assignment frequency >> ME recovery rate

30

Results - Unknown Origins

How is probability of recovering source of expansion affected if we assume an incorrect geographical origin ?

Performed 10,000 simulations on 14 alternative ‘true’ origins

Genetic data sets from 25 simulated potential origins compared with the ‘true’ origins

Measured probability of recovering correct geographic region of origin- 4 regions

Result: Correct assignment / region was much higher when origin was known:

0.771 vs. 0.882 (20 loci)0.852 vs. 0.999 (377 loci)

31

Results – Human Nuclear STRs

R90 goodness-of-fit statistics between original Rosenberg22 and scenario generated data showed: UO model fit better overall

BUT pointed to North African origin

Suspected this was caused by 377 STRs having European ascertainment bias. Ran tests to see if results would be affected by

an ascertainment bias

Tests showed probability of inferring correct origin drops when using biased data

Re-computed R90 statistics after correcting bias, and found East African origin was now more favored.

Agrees with other recent studies

32

Summary

Attempted to find geographic origin of modern humans from patterns in current world-wide genetic diversity

Explicitly accounted for physical constraints to dispersion

Results showed that the origin can be well recovered if: We have a large number of markers Markers do not suffer ascertainment bias Simulated origin is close to true origin

Simulations showed UO and ME models could be clearly distinguished

UO model favored R90 statistic four times higher for best UO scenario (0.1) vs.

best ME scenario (0.023)

Thank you

CS 374 – Algorithms in Biology Tuesday, October 24, 2006

Date post:	21-Dec-2015
Category:	Documents
View:	225 times
Download:	3 times

Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

Documents