Using genetics to study human history and natural selection David Reich Harvard Medical School...

Post on 18-Dec-2015

216 views 1 download

transcript

Using genetics to study human history and natural selection

David ReichHarvard Medical School Depatment of Genetics

Broad Institute

tttctccatttgtcgtgacacctttgttgacaccttcatttctgcattctcaattctatttcactggtctatggcagagaacacaaaatatggccagtggcctaaatccagcctactaccttttttttttttttgtaacattttactaacatagccattcccatgtgtttccatgtgtctgggctgcttttgcactctaatggcagagttaagaaattgtagcagagaccacaatgcctcaaatatttactctacagccctttataaaaacagtgtgccaactcctgatttatgaacttatcattatgtcaataccatactgtctttattactgtagttttataagtcatgacatcagataatgtaaatcctccaactttgtttttaatcaaaagtgttttggccatcctagatatactttgtattgccacataaatttgaagatcagcctgtcagtgtctacaaaatagcatgctaggattttgatagggattgtgtagaatctatagattaattagaggagaatgactatcttgacaatactgctgcccctctgtattcgtgggggattggttccacaacaacacccaccccccactcggcaacccctgaaacccccacatcccccagcttttttcccctgctaccaaaatccatggatgctcaagtccatataaaatgccatactatttgcatataacctctgcaatcctcccctatagtttagatcatctctagattacttataatactaataaaatctaaatgctatgtaaatagttgctatactgtgttgagggttttttgttttgttttgttttatttgtttgtttgtttgtattttaagagatggtgtcttgctttgttgcccaggctggagtgcagtggtgagatcatagcttactgcagcctcaaactcctggactcaaacagtcctcccacctcagcctcccaaagtgctgggatacaggtgtgacccactgtgcccagttattattttttatttgtattattttactgttgtattatttttaattattttttctgaatattttccatctatagttggttgaatcatggatgtggaacaggcaaatatggagggctaactgtattgcatcttccagttcatgagtatgcagtctctctgtttatttaaagttttagtttttctcaaccatgtttacttttcagtatacaagactttgacgttttttgttaaatgtatttgtaagtattttattatttgtgatgttatttaaaaagaaattgttgactgggcacagtggctcacgcctgtaatcccagcactttgggaggctgaggcgggcagatcacgaggtcaggagatcaagaccatcctggctaacatggtaaaaccccgtctctactaaaaatagaaaaaaattagccaggcgtggtggcgagtgcctgtagtcccagctactcgggaggctgaggcaggagaatggtgtgaacctgggaggcggagcttgcagtgagctgagatcgtgccactgcattccagcctgcgtgacagagcgagactctgtcaaaaaaataaataaaatttaaaaaaagaagaagaaattattttcttaatttcattttcaggttttttatttatttctactatatggatacatgattgatttttgtatattgatcatgtatcctgcaaactagctaacatagtttattatttctctttttttgtggattttaaaggattttctacatagataaataaacacacataaacagttttacttctttcttttcaacctagactggatgcattttttgtttttgtttgtttgtttgctttttaacttgctgcagtgactagagaatgtattgaagaatatattgttgaacaaaagcagtgagagtggacatccctgctttccccctgattttagggggaatgttttcagtctttcactatttaatatgattttagctataggtttatcctagatccctgttatcatgttgaggaaattcccttctatttctagtttgttgagattttttaattcatgtgattgcgctatctggctttgctctca

tc

ga

ga

ga

ga

ga

gc

gc

gc

tc

ga

ga

ga

ga

ga

tc

tc

tc

tc

ga

ga

ga

tc

gc

tc

tc

tc

A 2-part talk:

Section 1: How human history affects human genetic variation

Section 2: Detecting selection by the pattern of genetic variation and finding disease genes

How does human history affect genetic variation?

A genome-wide survey of Linkage Disequilibrium

Section 1

Linkage disequilibrium is a phenomenon whereby genetic variants are associated: people who have one tend to have a second as well

Linkage Disequilibrium Explained

Variations in Chromosomes Within a Population

Common Ancestor

Emergence of Variations Over Time

time present

Disease Mutation

Section 1

Time = present

What Determines Extent of LD?

2,000 gens. ago

Disease-Causing Mutation

1,000 gens. ago

Section 1

How Far Does Association (LD) Extend Between Neighboring Common Sites?

0kb160kb

80kb40kb20kb10kb5kb

Range of uncertainty

Section 1

• Theoretical: 3-8 kb

Strategy for Assessing Extent of LD

• 19 regions• 44 Caucasian samples from Utah• a great deal of DNA sequencing per sample

Distance from core single nucleotide polymorphism (SNP)

5 5 10 20 40 80

Section 1

0kb160kb

80kb40kb20kb10kb5kb

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2.8

Distance Between SNPs (Base Pairs)

Lin

ka

ge

Dis

eq

uil

ibri

um

|D'|

10kb5kb 20kb 80kb40kb 160kb unlinked1kb

Data

Previous Theoretical Prediction

Section 1

A Genome-Wide Assessment of Linkage Disequilibrium

Disease Gene Mapping

Human history

Section 1

MYSTERY: What explains the long-range LD?

Section 1

Important event in population history?

Positive Control: 48 Swedes

Identical pattern to Utah

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

3.5

Distance Between SNPs (Base Pairs)

Lin

ka

ge

Dis

eq

uili

bri

um

D'

10kb5kb 20kb 80kb40kb 160kb

Utah LD Curve

Sweden LD

Sweden LD With Sign of D' set by Utah

Section 1

96 Nigerians (Yoruba)

Much Less LD

Associations in Africans a SUBSET of those in Caucasians

MUST be influenced by population history

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

3.5

Distance Between SNPs (Base Pairs)

Lin

ka

ge

Dis

eq

uili

bri

um

D'

10kb5kb 20kb 80kb40kb 160kb

Utah LD Curve

Nigeria LD

Nigeria LD with sign of D' set by Utah

Section 1

Confirmation of less LD in Africans from Direct DNA Sequencing

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

500bp 5kb 10kb 20kb 40kb 80kb 160kb

Mea

n |D

'|

Nigerian

Utah

101

313

67

56

83

9816

174

86 6

48

4

6320

Anna DiRienzo also shows this pattern

Section 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50,000 100,000 150,000

Distance (bp)

Me

an

|D

'|CaucasianAfrican-AmericanAsianYoruban

More evidence from Genotyping~5,000 SNPs (Gabriel et al. 2002)

K. Kidd, J. Kidd, Sarah Tishkoff also show this

Section 1

Explanation: Bottleneck or ‘Founder Effect’ in History of North Europeans

What was this event?

(1) Out of Africa?

Ancestral Population

North Europeans

• likely <10 founding

chromosomes ~100,000years ago

YorubaAncestors

Section 1

(2) Founding of Europe?

Open Mysteries

Section 1

• what caused the bottleneck event?

“Out of Africa” migration?

• how many people involved? When did it occur?

• can we better understand when the founder

event occurred, and how many people involved?

Acknowledgements for Section 1

Collaborators:Michele CargillStacey BolkJames IrelandPardis C. SabetiDaniel J. RichterThomas LaveryRose KouyoumjianShelli F. FarhadianRyk WardEric S. Lander

Samples:Leif GroopRichard CooperCharles Rotimi

Using Long-Range Linkage Disequilibrium to Detect Positive Selectionin the Genome

Section 2

Overview

1. The difficulty of detecting genomic regions affected by natural selection

2. The long-range haplotype test

3. Results for two genes: G6PD and CD40 ligand

Section 2

Existing formal tests for selection

DNA Sequence analysis Tajima’s D HKA test Mcdonald and Kreitman Fu and Li’s D Ka/Ks ratio

Weak

Genotyping-based tests Not general at present

Section 2

Old alleles: • low or high frequency • short-range LD

Positive Selection

Our test is based on the relationship betweenallele frequency and extent of linkage disequilibrium

Young alleles: • low frequency • long-range LD

No selection

Young alleles: • high frequency • long-range LD

Section 2

The signal of selection

frequency

Link

age

Dis

equi

libriu

m

(Hom

ozyg

osity

)

Neutrality

Positive Selection

Section 2

gene

Paradigm of the Core Region

5

3

2

1

4

Core Haplotypes

Section 2

Long-range multi-SNP haplotypes

5

3

2

1

4

C/T A/G A/G C/T C/T C/T

Long-range markersCoremarkers

gene

Decay of LD

Section 2

Long-range multi-SNP haplotypes

100%

Decay of homozygosity

(probability, at any distance, that any two haplotypes that start out the same have all the same SNP genotypes) 18%

gene

C/T A/G A/G C/T C/T C/T

Coremarkers

Long-range markers

G G

C

C

C

C

T

T

T

T

C

T

75% 35%

T TC

C

A G

3

Section 2

CD40 ligand (2002):• Recent association by Sabeti et al.

• involved in immune regulation

Two genes associated with malaria resistance

• well established association to malaria resistance

G6PD (1960’s)

• selection demonstrated in 2001 by Tishkoff et al.

Section 2

Experimental Design

-180kb Gene +520kb

CD40 ligand (7 SNPs in core, 14 at long distances)

-480kb G6PD +220kb

-180kb TNFSF5 +520kb

telomere

-480kb Gene +220kb

telomere

G6PD (11 SNPs in core, 14 at long distances)

Section 2

Experimental Design

DNA samples from 231 African menYoruba (Nigeria)Beni (Nigeria)Shona (Zimbabwe)

Perfect phase (X chromosome)

Section 2

Core haplotypesG6PD

5

3

2

1

4

Africans(230)

6

7

8

9

38 72 428281441 5

46113 17

non-Africans(95)

CD40 ligand

591 97830 1

5

3

2

1

4

6

Africans(231)

77 21 7 7

non-Africans(91)

“A-” protective haplotype

Section 2

G6PD: long-range haplotype diversity

G6PD-corehap1 G6PD-corehap6

G6PD-corehap3 G6PD-corehap7

G6PD-corehap4 G6PD-corehap8

G6PD-corehap5 G6PD-corehap

G6PD-corehap8“A-” protectivehaplotype

Section 2

G6PD: homozygosity vs. distanceE

HH

Distance from the core region (kb)

Section 2

G6PD: computer simulation vs. data

Core haplotype frequency

Rel

ativ

e E

HH

Core haplotype 8P << 0.0008

Section 2

G6PD: P-values from simulationP

- val

ue

Distance from the core region (kb)

Section 2

G6PD also stands out in comparison to 7 control regions

Core haplotype frequency

Rel

ativ

e E

HH

Section 2

CD40 ligand:long-range haplotype diversity

corehap1 corehap4

corehap2 corehap5

corehap3

corehap4

Section 2

CD40 ligand: homozygosity vs. distanceE

HH

Distance from the core region (kb)

Section 2

CD40 ligand: computer simulation vs. data

Core haplotype frequency

Rel

ativ

e E

HH

Core haplotype 4P << 0.0011

Section 2

CD40 ligand: P-values from simulationP

- val

ue

Distance from the core region (kb)

Section 2

CD40 ligand also stands out incomparison to 7 control regions

Core haplotype frequency

Rel

ativ

e E

HH

Section 2

Malaria resistance arosein last 10,000 years in Africa

~2,500 years ago for G6PD

~6,500 years ago for CD40 ligand

Long-range linkage disequilibrium also gives a direct estimate of the date

Section 2

Traditional tests fail to detect the effect

Tajima’s D HKA test Mcdonald and Kreitman Fu and Li’s D Ka/Ks ratio

Not significant in our data. This test is a powerful way to detect selection in last 10,000 years

Section 2

3

2

1

4

Conclusions: Powerful general approach for detecting selection

Section 2

3

2

1

4

5

Conclusions: Powerful general approach for detecting selection

Section 2

3

2

1

4

Screen the genome for Postive Selection

Conclusions: Powerful general approach for detecting selection

Section 2

Conclusions: Genome-wide screen for natural selection

We can find disease genes without patients!

Section 2

What’s coming…Section 2

1. Generalization of the long-range haplotype test

2. Application of the approach genome-wide

• Haplotype map data set

• Disease gene screen data sets

Acknowledgements for Section 2Pardis C. SabetiJohn HigginsHaninah Z.P. LevineDaniel J. RichterStephen F. SchaffnerStacey GabrielJill V. PlatkoNicholas J. Patterson

Gavin J. McDonaldHans C. AckermanSarah J. CampbellDavid AltshulerRichard CooperRyk WardEric S. Lander

Note

The 3rd section of the talk is not included here because it presents data that have not yet been published.