+ All Categories
Home > Documents > Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

Date post: 21-Dec-2015
Category:
View: 225 times
Download: 3 times
Share this document with a friend
32
Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006
Transcript
Page 1: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

Human Migrations

Anjalee Sujanani

CS 374 – Algorithms in Biology Tuesday, October 24, 2006

Page 2: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

2

Papers

Review: The application of molecular genetic approaches to the study of human evolution L. Luca Cavalli-Sforza & Marcus W. Feldman Nature Genetics  33, 266 -

275 (2003)

Recovering the geographic origin of early modern humans by realistic and spatially explicit simulations Nicolas Ray, Mathias Currat, Pierre Berthier, and Laurent Excoffier,

Genome Res., Aug 2005; 15: 1161 - 1167.

Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa Sohini Ramachandran, Omkar Deshpande, Charles C. Roseman, Noah

A. Rosenberg, Marcus W. Feldman, and L. Luca Cavalli-Sforza, PNAS, November 1, 2005, Vol. 102, No. 44  

Page 3: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

4

Timeline: Study of Genetic Variation

1919Existence of human genetic variation first demonstrated in a study of ABO gene

1966Studies showed that almost every protein has genetic variants

These variants became useful markers for population studies

Marker:

A gene or other segment of DNA whose position on a chromosome is known

1980A new method to construct a genetic linkage map of the human genome by using radioisotopes generated more new markers

1986Polymerase Chain Reaction (PCR) developed

Allows a small amount of the DNA molecule to be amplified exponentially

Expanded the number of studies that could work directly with DNA

1990sDevelopment of automated DNA sequencing

Page 4: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

5

Shaping of Genetic Variation

Human evolution Genome structure Population history

Human migration Dating origin of our species Tracking migrations of our species using DNA

Relationship of separated human populations

Page 5: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

6http://www.ncbi.nlm.nih.gov/Class/NAWBIS/Modules/Variation/var17.html

Gene Variation

All genetic variation is caused by mutations Most common: Single Nucleotide

Polymorphisms

A single base change, occurring in a population at a frequency of >1% is termed a single nucleotide polymorphism (SNP).

When a single base change occurs at <1% it is considered to be a mutation

SNPs: DNA sequence variation occurring when a single nucleotide - A, T, C, or G - in the genome (or other shared sequence) differs between members of a species.

e.g. AAGCCTA

AAGCTTA

Here, we say that there are two alleles : C and T

Allelic frequency: Defines variation at a single nucleotide position

Polymorphism: Variation in DNA sequences between individuals

Page 6: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

7

Allele Frequency Change (1)

Natural Selection tendency of beneficial alleles to become more

common over time and detrimental ones less common

Random Genetic Drift fundamental tendency of any allele to vary randomly

in frequency over time due to statistical variation alone

=> Both can lead to elimination or fixation of an allele

Page 7: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

8

Allele Frequency Change (2)

Migration Avg. 1 immigrant per generation in a population

Sufficient to keep drift partially in check Avoids complete fixation of alleles

Whole populations migrate and settle elsewhere If initially small and then expands:

founder frequency vs. original population frequency: Δ

founder frequency vs. new location population frequency: Δ Creates more chances for drift Causes divergence and intergroup variation in allele

frequencies

=> Group migration has opposite effect to individual migration

Page 8: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

9

Population History (1)

Large populations that are geographically and genetically distant: History can be inferred by Population Trees

Assumptions:

Fissions occur randomly in time

Constant rate of neutral evolution in each population between fissions

Neutral Evolution:

Where many changes that occur during evolution are selectively neutral

The frequency of a selectively neutral gene is as likely to decrease as to increase by genetic drift

On average the frequencies of neutral alleles remain unchanged from one generation to the next.

Page 9: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

10

Summary Tree of World Population

Polymorphisms of 120 protein genes 1,915 populations

Nature genetics supplement, Volume 33, March 2003, pg 267

Page 10: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

11

High Resolution Population History (1) mtDNA – maternally inherited

Page 11: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

12

High Resolution Population History (2) Y chromosome – paternally inherited

Page 12: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

13

Population History (2)

Geographically close populations Distances often highly

correlated

Genetic distance of population pairs measured by FST

Function of geographic distance between members of population pairs

Asymptote for genetic distance 1,000-2,600 miles avg.

World and Asia higher since are not at equilibrium

Nature genetics supplement, Volume 33, March 2003, pg 268

Page 13: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

14

Population History (3)

Statistical Methods: Principal Components

Good for when migration is frequent between neighbors

Gives similar results to trees (assumptions hold)

Cluster Detection Good for analyzing large population genetic datasets Produces same primary continental clusters as earlier

methods

Page 14: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

15

Tracking migrations using DNA

‘Standard model of modern human evolution’ Expansion from East Africa

Nature genetics supplement, Volume 33, March 2003, pg 270

Expansion from East Africa into rest of Africa by ~ 1,000 individuals

Second expansion – into Asia

Page 15: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

16

Serial Founder Effect Originating in Africa

Found a linear relationship between genetic and geographic distances in a world-wide sample of human populations

Ramachandran et al.,PNAS 102(44)

= Within region comparison

= Africa vs. Eurasia

= America vs. Oceania

Waypoints:

Fixed locations used in estimation of between-continental distances.

Makes estimates more reflective of human migration patterns

Based on belief humans did not cross large bodies of water while migrating (until recently)

Page 16: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

17

Alternate Hypothesis

‘Multiregional model’ Human populations originated in each

continent and evolved in parallel

Study: Recovering the geographic origin of early modern humans by realistic and spatially explicit simulations Quantifies likelihood of Unique Origin (UO) model

relative to Multiregional Evolution (ME) model.

Nature genetics supplement, Volume 33, March 2003, pg 270

Page 17: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

18

Method (1)

Simulations designed to match a previous study that used 377 Short Tandem Repeats (STRs)

Rosenberg, N.A., Pritchard, J.K., Weber, J.L., Cann, H.M., Kidd, K.K., Zhivotovsky, L.A., and Feldman, M.W. 2002. Genetic structure of human populations. Science 298: 2381–2385

STR:

A common class of polymorphism, consisting of a pattern of two or more nucleotides repeating in tandem.

Repeat unit: 2-10 base pairs

e.g. GATAGATAGATAGATAGATAGATA

Simulations designed to match a previous study that used 377 Short Tandem Repeats (STRs) 1052 individuals, 52 populations

Individuals Separates different populations

Populations – e.g. Han

Regions - e.g. East Asia

Page 18: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

19

Method (2)

Considered populations with sample size > 20 individuals Called the Rosenberg22 data set

Nicolas Ray, Mathias Currat, Pierre Berthier, and Laurent Excoffier, Recovering the geographic origin of early modern humans by realistic and spatially explicit simulationsGenome Res., Aug 2005; 15: 1161 - 1167.

Carrying Capacity:

Population level that can be supported for an organism, given the quantity of food, habitat, water and other life infrastructure present.

Friction:

The relative difficulty of moving through a deme.Deme:

A sub-population.

Land surface of the Old World was divided into a grid of 9,226 demes – each 100 x 100 km2

Page 19: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

20

Simulations

Used software called SPLATCHE

Step 1: Forward time simulation of demographic and spatial expansion using environmental information For each generation, record:

# individual genes/deme # immigrant genes from 4 nearest-neighboring demes

Step 2: Backward time simulation of genealogy and gene diversity sampled at given locations using information generated in Step 1

Page 20: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

21

Step 1

Origin of each expansion: single deme with 50 diploid individuals (100 nuclear genes) 25 origins of expansion

Onset of expansion: 4000 generations (20,000 years ago)

Each generation, occupied demes subject to: Growth phase

Constant growth rate r= 0.3 Carrying capacity K depends on environment of deme

Emigration phase

Page 21: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

22

Step 1 – Emigration Phase

Distributed 0.05Nt emigrants to 4 nearest neighboring demes Nt = size of deme at time t

Exact number of emigrants sent to each deme (Ei) controlled through friction values (Fi), for each deme Friction = relative difficulty of moving through a deme Fi values kept within range of 0.1 to 1

Ei : computed from multinomial distribution:

Page 22: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

23

Step 2

For each of 25 geographic origins, performed 10,000 simulations of genetic diversity Simulations generated molecular diversity data at a given number

of STR loci for each of the 22 population samples

Tested 10 evolutionary scenarios ME Model: 9 combinations of population size and migration rates

For each data set: Index of population differentiation (RST) was computed between all

pairs of populations

Provided a measure of genetic divergence between populations

Scenarios 26 – 28:

Same population size, different migration rates

Scenarios 29 – 31:

Africa population size > Asia and Europe population sizes

Migration rate adjusted so # emigrants same

Scenarios 32 – 34:

Africa population size > Asia and Europe population sizes

Same migration rates

Scenarios 26 – 28:

Same population size, different migration rates

Scenarios 29 – 31:

Africa population size > Asia and Europe population sizes

Migration rate adjusted so # emigrants same

Scenarios 32 – 34:

Africa population size > Asia and Europe population sizes

Same migration rates

Scenarios 26 – 28:

Same population size, different migration rates

Scenarios 29 – 31:

Africa population size > Asia and Europe population sizes

Migration rate adjusted so # emigrants stayed same

Scenarios 32 – 34:

Africa population size > Asia and Europe population sizes

Same migration rates

Page 23: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

24

Timeline

1. 30,000 generations ago:

Demographic expansion following first speciation event

2. For 26,000 generations:

Large, subdivided population exists

3. 4,000 generations ago:

Bottleneck of 10 generations followed by range expansion

1. 30,000 generations ago:

Small population went through speciation & instantaneously colonized 3 continents

2. For 26,000 generations:

Continents had large populations & exchanged occasional migrants

3. 4,000 generations ago:

Three range expansions from three different origins (shown in C)

1. 30,000 generations ago:

Small population went through speciation & instantaneously colonized 3 continents

2. For 26,000 generations:

Continents had large populations & exchanged occasional migrants

3. 4,000 generations ago:

Three range expansions from three different origins (shown in C)

1. 30,000 generations ago:

Small population went through speciation & instantaneously colonized 3 continents

2. For 26,000 generations:

Continents had large populations & exchanged occasional migrants

3. 4,000 generations ago:

Three range expansions from three different origins (shown in C)

Page 24: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

25

Results - Analysis (1)

What they wanted to do:1. Assume a given genetic data set is the product of a

specific evolutionary scenario2. Estimate likelihood of all scenarios that can generate

that data.3. Choose scenario maximizing the likelihood

What they did:1. Select a restricted number of ME and UO scenarios2. For each, replace likelihood by measure of

goodness-of-fit of genetic data to the model3. Choose scenario maximizing goodness of fit

Page 25: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

26

Results - Analysis (2)

Goodness-of-fit between observed data and model determined using simulations:

Computed a correlation co-efficient using index of population differentiation RST values calculated earlier

Repeated this for many simulations/model to get a probability distribution of the co-efficient under each model.

90% quantile value of distribution used as goodness-of-fit index

Value chosen as a result of previous extensive simulation experience

Called index the R90 statistic

Page 26: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

27

Results - Validation

Simulations were used to evaluate how correctly geographic origin of an expansion could be recovered from STR data.

10,000 simulations performed per scenario Divided into 2 sets of 5,000 simulations First 5,000 runs under each scenario used as pseudo-observed

data Compared to second 5,000 generated under all scenarios

Geographic origin of expansion assigned to scenario with the largest R90 statistic

Tallied up how many times they correctly assigned pseudo-observed simulations to true origin for each model

Page 27: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

28

Results - Distinguishing UO vs. ME models (1)

Used similar validation method to differentiate data sets generated under the two models

Dataset is correctly assigned if chosen scenario belongs to same evolutionary model that generated it - i.e.

Is data set generated under a UO scenario is assigned to any geographic origin under the UO model ?

Is data set generated under any ME scenario is assigned to any ME scenario ?

Correct regardless of location of origin or ME scenario

Page 28: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

29

Results - Distinguishing UO vs. ME models (2)

Results showed evolutionary models well discriminated with single locus: UO correct assignment frequency >> ME recovery rate

Page 29: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

30

Results - Unknown Origins

How is probability of recovering source of expansion affected if we assume an incorrect geographical origin ?

Performed 10,000 simulations on 14 alternative ‘true’ origins

Genetic data sets from 25 simulated potential origins compared with the ‘true’ origins

Measured probability of recovering correct geographic region of origin- 4 regions

Result: Correct assignment / region was much higher when origin was known:

0.771 vs. 0.882 (20 loci)0.852 vs. 0.999 (377 loci)

Page 30: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

31

Results – Human Nuclear STRs

R90 goodness-of-fit statistics between original Rosenberg22 and scenario generated data showed: UO model fit better overall

BUT pointed to North African origin

Suspected this was caused by 377 STRs having European ascertainment bias. Ran tests to see if results would be affected by

an ascertainment bias

Tests showed probability of inferring correct origin drops when using biased data

Re-computed R90 statistics after correcting bias, and found East African origin was now more favored.

Agrees with other recent studies

Page 31: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

32

Summary

Attempted to find geographic origin of modern humans from patterns in current world-wide genetic diversity

Explicitly accounted for physical constraints to dispersion

Results showed that the origin can be well recovered if: We have a large number of markers Markers do not suffer ascertainment bias Simulated origin is close to true origin

Simulations showed UO and ME models could be clearly distinguished

UO model favored R90 statistic four times higher for best UO scenario (0.1) vs.

best ME scenario (0.023)

Page 32: Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

Thank you

CS 374 – Algorithms in Biology Tuesday, October 24, 2006


Recommended