Lecture 3 l dand_haplotypes_full

Post on 04-Nov-2014

83 views 3 download

Tags:

description

Linkage disequilibrium and haplotypes

transcript

Introduction

All about my classes

LekkiWood@Gmail.com

• Lectures are stand alone - No preparation needed except for previous course content.

• Nearly always provide additional resources• -Take home exercise• -Papers referenced• -Resources such as other lecture slides

All about me….

All about you….

Try to always orient you to the session

• Go over the theory of linkage disequilibrium and haplotypes

• Calculate linkage disequilibrium by hand• Relaxing session: story of HapMap• Lab: Today walk you through, hand-holding look

at HapMap.

• Each ~30 minutes, so please go spent extra time familiarizing yourself with HapMap.

Try to give you your learning objectives

• Primary objectives• Describe linkage disequilibrium and a haplotype• Explain the meaning of r2 = 1.0, r2 = .8 and r2 = .5• Find a region of interest (ROI) on HapMap• Locate tagSNPs for an ROI on HapMap. • Secondary objectives• Describe how mutations and recombination give rise to linkage

disequilibrium and haplotypes• Calculate D, D’ and r2 by hand • List key differences between D, D’ and r2

• Evaluate the contribution of HapMap to public health genetics

Part 1Haplotype and Linkage Disequilibrium

theory

One source of variation in our DNA occurs through mutation events….

A

C

C

Mutation

Ancestral population

Mutation event

A

Population

Mutations that proliferate are ‘SNPs’

• Single Nucleotide Polymorphisms• The most common type of variation in DNA• Substitution of 1 nucleotide for another• 2/3 SNPs involve C-> T • Definition is evolving:

• Old definition: SNPs must be seen in 1% of the population

• SNPs occur ~ every 300 bp• Therefore ~ 10 million SNPs in the human genome

The number of mutations increases over time

A

C

1st Mutation event

2nd Mutation event

G

G

A

C

G

G

C C Mutation

Proliferating SNPs give rise to haplotypes

• A haplotype is “A specific set of DNA variants observed on a single chromosome, or part of a chromosome”

• In practice, usually referring to a set of SNPs within a single gene

Haplotypes:

A

C

G

G

C C

A C

A

A

A

A

T

T

T

T

T

T

T

T

C

C

C

C

T

T

T

T

Haplotype 1: AG

Haplotype 2: CG

Haplotype 3: CC

Haplotype 4: AC

Resolve the population haplotypes!

C G A C T A G T

GA, CA, GT, CT,

C G A C T A G T A C C A

GAG, CAG, GTG, CTG, GAT, CAT, GTT, CTT,

GC

AT

GC

AT

GT

How many possible haplotypes?

C G A C T A G T

GA, CA, GT, CT,

C G A C T A G T A C C A

GAG, CAG, GTG, CTG, GAT, CAT, GTT, CTT,

GC

AT

GC

AT

GT

22 = 6

23 = 8

How many possible haplotypes?

2 (alleles) to the power of n loci:2n

How many haplotypes does a person have for a given chromosomal region?

C G A C T A G TGC

AT

C G A C T A G TGC

AT

C G A C T A G TGC

AT

But what if the person is homozygous at both loci?

C G A C T A G TGC

AT

C G A C T A G T

C G A C T A G T

GA, CA, GT, CT,

C

C

T

T

CT, CT, CT, CT,

Haplotype overview

• Method of characterizing variation at more than one locus on a chromosome

• Only 1 allele from each locus• But as many alleles as there are loci on the

chromosome… IF….……those loci contain variation (SNPs)

• Like SNPs each person has 2 haplotypes….. Which (like SNPs) may be the same

• The number of possible haplotypes in the population is 2 to the power of n loci.

Variation in our DNA also occurs through recombination

A G

Before recombination

After recombination

C G

C C

A G

C G

C C

A C

The number of recombination events increases over time

Our chromosome are mosaics….

• The extent and conservation of pieces depends on:• Recombination rate• Mutation rate• Population size• Natural selection

What do these mosaics mean….

…. For our haplotypes?

Key concept….

…. alleles often co-occur at greater than chance levels

XX

Linkage Disequilibrium (LD)

• The nonrandom association of alleles at different loci

• Equilibrium – when things are ‘in balance’ or as we would expect

• When a particular allele at one locus is found together on the same chromosome with a specific allele at a second locus, more often than expected if the loci were segregating independently in a population. The loci are in disequilibrium – it is out of balance, or not what we would expect

Linkage disequilibrium is a measureable trait

Determined by space and time

XX

Time decreases linkage disequilibrium

X X

Space decreases linkage disequilibrium

X XX XX

Summary of part 1

• Mutations give rise to SNPs• SNPs give rise to haplotypes• A haplotype is a specific set of DNA variants • Recombination patterns lead to linkage

disequilibrium • Linkage disequilibrium is when we see haplotypes

more often than by chance

Questions before we proceed to calculating LD?

Part 2

Calculating Linkage Disequlibrium

All about punnet squares….

Locus B

Locus A

B b

A

a

PAB PAb

PaB Pab

Totals

Totals:

PA

Pa

PB Pb 1.0

2 loci; A: A/a, B: B/bWhat are out haplotypes?

All about punnet squares (in LD calculation)….

• Each cell contains frequency of a haplotype• Row & column ends contain the frequency of an

allele• When you sum the rows and columns you should

get 1.0

Measures of Linkage Disequilibrium

• (A Little History lesson)• Three measures of LD:

• D • D’• r

Measures of Linkage Disequilibrium - D

• 1960 Lewontin & Kojima• D – unstandardized measure of how far the

association between two alleles differs from that expected by chance

Linkage Equilibrium

PAB = PAPB

Linkage Disequilibrium

PAB = PAPB

Linkage Disequilibrium

PAB = PAPB

D = PAB - (PAPB)

Linkage Disequilibrium – an example

Given the following haplotype frequencies – are the alleles in linkage disequilibrium?PAB = .2PAb = .5PaB = .3Pab = .0i.e. what is D?

D = PAB - (PAPB)

Step 1: Complete the punnet square PAB = .2PAb = .5PaB = .3Pab = .0

Locus B

B b

A

a

.2 .5

.3 .0

Totals

Totals:

.7

.3

.5 .5 1.0

D = PAB - (PAPB)

Locus A

Step 2: Calculate allele frequencies PAB = .2PAb = .5PaB = .3Pab = .0

PA = Pa = PB = Pb =

.7

.3

.5

.5

D = PAB - (PAPB)

Step 3: Calculate D PAB = .2PAb = .5PaB = .3Pab = .0

PA = Pa = PB = Pb =

.7

.3

.5

.5

D = PAB - (PAPB)

D=.2 – (.7 * . 5)D= -.15

Are the alleles in linkage disequlibrium?

Measures of Linkage Disequilibrium - D

Problems:• Sign is arbitrary• Range depends on allele frequencies

Measures of Linkage Disequilibrium – D’

• 1964 Lewinton• D’ – Standardize D to the maximum possible value it

can take

• D’ = D / Dmax/min

Step 4: Calculate Dmax/min PAB = .2PAb = .5PaB = .3Pab = .0

PA = Pa = PB = Pb =

.7

.3

.5

.5

D = -.15

• Where D is positive:Dmax = the lesser of PAPb or PaPB

• Where D is negative:Dmin = the larger of -PAPB or -PaPb

What is our Dmax/min?

Max {-.7*.5, -.3*.5} =

Max{-.35, -.15}

Step 5: Calculate D’ PAB = .2PAb = .5PaB = .3Pab = .0

PA = Pa = PB = Pb =

.7

.3

.5

.5

D = -.15

Dmin = -.15

D’= D / Dmax/min

D’ = -.15 / -.15 = 1

Measures of Linkage Disequilibrium – D’

• D’= +/- 1 = complete LD• No evidence for recombination• Ancestral haplotype not disruptedProblems• D’ is inflated in small N• D’ inflated with rare alleles• No information on allele frequency

Measures of Linkage Disequilibrium – r2

• 1968 Hill & Robertson• r2 = correlation coefficient between 2 alleles

Step 5: Calculate r2 PAB = .2PAb = .5PaB = .3Pab = .0

PA = Pa = PB = Pb =

.7

.3

.5

.5

D = -.15

Dmin = -.15

r2 = D2 / PA Pa PB Pb

r2 = -.152 / [.7*.3*.5*.5] = .43

Measures of Linkage Disequilibrium – r2

• r2 = 0-1• 1= two markers give identical informationProblems

What can we learn from our 3 measures of LD?

• D = -.15• D’ = 1.0• r2 = .43

D’ vs r2

• Both are a measure of association with 1 being the maximum, and indicating most LD

• BUT r2 requires equal allele frequency to be 1.

Perfect LD

• Equal allele frequency• Allelic association is as strong

as possible– 2 haplotypes observed – No detected recombination

between SNPs

D´ = 1 r2 = 1

Complete LD

Unequal allele frequency– 3 haplotypes observed – No detected recombination

between SNPs

D´ = 1 r2 < 1

Calculate your own Linkage Disequilibrium measures of D, D’ and r2

PAB = .6PAb = .1PaB = .2Pab = .1

At the end of the day…..

Linkage disequilibrium is the non random association of markers [SNPs] at two or more loci

….. But what does this mean for applying genetics to public health? (finally we get there….)

Part 3Using LD in genetic studies: The Hapmap

consortium

The Human Genome Project

DbSNP

Cystic Fibrosis

Inflammatory bowel disease

• Likely had many causal variants• Heritable MZ > DZ• 10% of those with IBD had 1 relative with IBD• Reasonable linkage signal on Chr 5• What could explain this structure?

Inflammatory bowel disease

5qp31

5qp31

8 SNPsGGACAACCAATTCGGG

Haplotype Map

• Add to Human Genome Project with information on diversity

• How did HapMap and Human genome project differ?

• ‘Chunks’ of data

8 SNPsGGACAACCAATTCGGG

“Short cuts”

A T A G T A C ATC

AC

AT

GA

GC

GCA

AATT

GGAA

GCGC

TCCC

GCGC

ACCC

SNPs 1, 3 and 4 are TagSNPs

HapMap

• Launched in 2001• Open access resource for all researchers• In real time• Spin off from The Human Genome Project• Qu: What was the key difference between the HGP

and HapMap?• Characterizes LD across the genome• Also develop analytic tools

• Haploview

HapMap

“The success of the HapMap will be measured in terms of the genetic discoveries enabled, and improved knowledge

of disease aetiology.”

HapMapMark Daly “The

community’s response after a number of years of

struggling and to not finding genetic factors for

complex disease”.

HapMap – Phase 1

• Launched in 2001; Production 2002-3• Phase I• Not comprehensive• 90 Yoruba individuals• 90 individuals of European descent • 45 Han Chinese• 45 Japanese• 1,000,000 SNPs

HapMap – Phase 1

Minor allele frequency

HapMap – Phase I

• Released in 2005• 1 million SNPs• August 2006, “dbSNP included more than ten million SNPs, and

more than 40% of them were known to be polymorphic. By comparison, at the start of the project, fewer than 3 million SNPs were identified, and no more than 10% of them were known to be polymorphic.”

HapMap – an LD plot

HapMap – Phase I

Recombination hotspots are widespreadand account for LD structure

HapMap – Phase I

Tagger

Table 7 Number of selected tag SNPs to capture all observed common SNPs in the Phase I HapMap for the three analysis panels using pairwise tagging at different r2 thresholds

YRI CEU CHB+JPT

Pairwise r2 ≥ 0.5 324,865 178,501 159,029

r2 ≥ 0.8474,409 293,835 259,779

r2 = 1 604,886 447,579 434,476

Will tag SNPs picked from HapMap apply to other population samples?

Population differences add very little inefficiency(stolen slide from ASHG... I can’t source this)

CEU

Whites fromLos Angeles, CA

Botnia, Finland

CEUCEU

Utah residents with European ancestry

(CEPH)

HapMap – Phases II and III

• Phase II• >3.1 million genetic variants• Captured 90 to 96 percent of common genetic

variation• Phase III• 1,301 samples from 11 populations

HapMap and Public Health

• How has HapMap helped us in the quest to find genes for disorders?

What is next for HapMap?

• 1,000 Genomes Project

Part 4

HapMap Practical

Goals of this lab

Part 11. Find HapMap SNPs near a gene.2. View patterns of LD amongst the SNPs.3. Select tag SNPs.4. Download information on the SNPs for use in

Haploview.5. Evaluate genotype data in a paper against HapMap

data.Part 26. Make a file from data for use in haploview

Data origin

Goals of this lab

Part 11. Find HapMap SNPs near a gene.2. View patterns of LD amongst the SNPs.3. Select tag SNPs.4. Download information on the SNPs for use in

Haploview.5. Evaluate genotype data in a paper against HapMap

data.

Goals of this lab

Part 11. Find HapMap SNPs near a gene.>Navigate to HapMap>Using release #27 (Pase 3) locate the LRP1 gene (hint: it is a landmark).>Answer questions 1-3

1. Go to hapmap.ncbi.nlm.nih.gov

2. Select release 2, Phase #3

3. Put LRP1 in the search box

5. Look at the information

6. Turn different tracks on and off

(Don’t forget ‘update image’)

7. Count the genotyped SNPs

8. Create an LD plot

9. Choose tag SNPs

Goals of this lab

Part 11. Find HapMap SNPs near a gene.2. View patterns of LD amongst the SNPs.3. Select tag SNPs.4. Download information on the SNPs for use in

Haploview.5. Evaluate genotype data in a paper against HapMap

data.

10. Download LRP1 data & open in Haploview

11. Open in Haploview, Answer questions 4-7

Slide graveyard

6. Turn different tracks on and off

(Don’t forget ‘update image’)

6. Turn different tracks on and off

(Don’t forget ‘update image’)

4. Look at the different PPARy

Try to give you your learning objectives

• Primary objectives• Describe linkage disequilibrium and a haplotype• Explain the meaning of r2 = 1.0, r2 = .8 and r2 = .5• Find a region of interest (ROI) on HapMap• Locate tagSNPs for an ROI on HapMap. • Secondary objectives• Describe how mutations and recombination give rise to linkage

disequilibrium and haplotypes• Calculate D, D’ and r2 by hand • List key differences between D, D’ and r2

• Evaluate the contribution of HapMap to public health genetics

A

C

C G A C T A G T A C C ATC

AG

GT

T G A C T A A G T A C C G A

8 Possible SNP combinations:

C T G A C T A A G T A C C T A

C T G A C T A G G T A C C G A

C T G A C T A G T A C C T A

C C G A C T A G T A C C G A

C C G A C T A A G T A C C T A

C C G A C T A G G T A C C G A

C C G A C T A G G T A C C T A

Haplotype 1

Haplotype 2

Haplotype 3

Haplotype 4

Haplotype 5

Haplotype 6

Haplotype 7

Haplotype 8

G

C

C G A C T A G T A C C ATC

AG

GT

T G A C T A A G T A C C G A

8 Possible Haplotypes:

C T G A C T A A G T A C C T A

C T G A C T A G G T A C C G A

C T G A C T A GH

G T A C C T A

C C G A C T A A G T A C C G A

C C G A C T A A G T A C C T A

C C G A C T A G G T A C C G A

C C G A C T A G G T A C C T A

C

C G A C T A G T A C C ATC

AG

GT

T G A C T A A G T A C C G A

8 Possible Haplotypes:

C T G A C T A A G T A C C T A

C T G A C T A G G T A C C G A

C T G A C T A GH

G T A C C T A

C C G A C T A A G T A C C G A

C C G A C T A A G T A C C T A

C C G A C T A G G T A C C G A

C C G A C T A G G T A C C T A

1

72

C

C G A C T A G T A C C ATC

AG

GT

T G A C T A A G T A C C G A

8 Possible Haplotypes, but 3 observed haplotypes:

C T G A C T A A G T A C C T A

C T G A C T A G G T A C C G A

C T G A C T A GH

G T A C C T A

C C G A C T A A G T A C C G A

C C G A C T A A G T A C C T A

C C G A C T A G G T A C C G A

C C G A C T A G G T A C C T A

TAGTATTGGTGTCAGCATCGGCGT

1. Information about our population

• Factors that influence linkage disequilibrium:• Genetic drift • Mutation• Founder effects• Selection• Stratification

• Factors that maintain linkage disequilibrium:• Selection• Non-random mating• Linkage

• Mainstay of ‘population genetics’

2. Interpretation of our findings

• Genetic association is correlational therefore, we cannot make causal inferences• SNP1 -> Trait• SNP1 and SNP2 are in LD• We don’t know which is the true causal

variant

Linkage Disequilibrium coefficient D’

PAB = PAPB

DAB = PAB - PAPB

PAB = PAPB + DAB

Problems:• Sign is arbitrary• Range depends on allele frequencies

Q: Why are these problems for applied genetics in public health?

Calculating Linkage EqulibriumLocus B

Locus A B b

A

a

PAB PAb

PaB Pab

Totals

Totals:

PA

Pa

PB Pb 1.0

A

C

C G A C T A G T A C C ATC

AG

GT

T G A C T A A G T A C C G A

8 Possible SNP combinations:

C T G A C T A A G T A C C T A

C T G A C T A G G T A C C G A

C T G A C T A G T A C C T A

C C G A C T A G T A C C G A

C C G A C T A A G T A C C T A

C C G A C T A G G T A C C G A

C C G A C T A G G T A C C T A

Haplotype 1

Haplotype 2

Haplotype 3

Haplotype 4

Haplotype 5

Haplotype 6

Haplotype 7

Haplotype 8

G

C

C G A C T A G T A C C ATC

AG

GT

T G A C T A A G T A C C G A

8 Possible Haplotypes:

C T G A C T A A G T A C C T A

C T G A C T A G G T A C C G A

C T G A C T A GH

G T A C C T A

C C G A C T A A G T A C C G A

C C G A C T A A G T A C C T A

C C G A C T A G G T A C C G A

C C G A C T A G G T A C C T A

C

C G A C T A G T A C C ATC

AG

GT

T G A C T A A G T A C C G A

8 Possible Haplotypes:

C T G A C T A A G T A C C T A

C T G A C T A G G T A C C G A

C T G A C T A GH

G T A C C T A

C C G A C T A A G T A C C G A

C C G A C T A A G T A C C T A

C C G A C T A G G T A C C G A

C C G A C T A G G T A C C T A

1

72

C

C G A C T A G T A C C ATC

AG

GT

T G A C T A A G T A C C G A

8 Possible Haplotypes, but 3 observed haplotypes:

C T G A C T A A G T A C C T A

C T G A C T A G G T A C C G A

C T G A C T A GH

G T A C C T A

C C G A C T A A G T A C C G A

C C G A C T A A G T A C C T A

C C G A C T A G G T A C C G A

C C G A C T A G G T A C C T A

TAGTATTGGTGTCAGCATCGGCGT

Linkage Equilibrium

PAB = PAPB

PAb = PaPb = PA P (1-Pb)PaB = PaPB = (1-PA) PB

Pab = PaPb = (1-PA) (1-PB)

Linkage Disequilibrium coefficient D

PAB = PAPB

DAB = PAB - PAPB

Problems:• Sign is arbitrary• Range depends on allele frequencies

Q: Why are these problems for applied genetics in public health?

S.M. Bray, J.G. Mulle, A.F. Dodd, A.E. Pulver, S. Wooding and S.T. Warren. Signatures of founder effects, admixture and selection in the Ashkenazi Jewish population. PNAS Early Edition (2010).

C T G A C T A A G T A C C G AC T G A C T A A G T A C C T AC T G A C T A G G T A C C G AC T G A C T A G G T A C C T AC C G A C T A A G T A C C G AC C G A C T A A G T A C C T AC C G A C T A G G T A C C G A

C C G A C T A G G T A C C T A

8 Possible haplotypes:

Haplotype 1

Haplotype 2

Haplotype 3

Haplotype 4

Haplotype 5

Haplotype 6

Haplotype 7

Haplotype 8

C T G A C T A A G T A C C G AC T G A C T A A G T A C C T AC T G A C T A G G T A C C G AC T G A C T A G G T A C C T AC C G A C T A A G T A C C G AC C G A C T A A G T A C C T AC C G A C T A G G T A C C G A

C C G A C T A G G T A C C T A

Measures of Linkage Disequilibrium - D

• 1960s Lewontin & Kojima• D – unstandardized measure of how far the

association between two alleles differs from that expected by chance

Then we get recombinationA

C

G

G

C C

A

C

G

G

C C

Before recombination

After recombination

A C

C T G A C T A A G T A C C G AC T G A C T A A G T A C C T AC T G A C T A G G T A C C G AC T G A C T A G G T A C C T AC C G A C T A A G T A C C G AC C G A C T A A G T A C C T AC C G A C T A G G T A C C G A

C C G A C T A G G T A C C T A

Ancestor

Present Day

Recombination on an individual level

Measures of Linkage Disequilibrium - D

• At single locus: Aa PA = (1-Pa)

C

C G A C T A G T A C C ATC

AG

GT

T G A C T A A G T A C C G A

8 Possible SNP combinations:

C T G A C T A A G T A C C T A

C T G A C T A G G T A C C G A

C T G A C T A GH

G T A C C T A

C C G A C T A A G T A C C G A

C C G A C T A A G T A C C T A

C C G A C T A G G T A C C G A

C C G A C T A G G T A C C T A

Refresher

• Recombination

Sources of variation in our DNA

New Concept – Linkage Disequilibrium

• Linkage Disequilibrium is the tendency for 2 (or more) SNPs to be inherited together

• AATAAGCCTGATC• ATTAAGCCTGATC• AATTAGCCTGATC• ATTAAGGCTGATC

Why is this important?

• Allows to only genotype certain SNPs of the genome…

• ….. We can infer more than we type

Haplotype

• Inheritance of a cluster of SNPs• “Haploid” “Genotype”