+ All Categories
Home > Documents > FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Date post: 20-Jan-2016
Category:
Upload: horatio-pierce
View: 216 times
Download: 0 times
Share this document with a friend
59
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003
Transcript
Page 1: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

FINE SCALE MAPPING

ANDREW MORRISWellcome Trust Centre for Human GeneticsMarch 7, 2003

Page 2: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Outline

Introduction: fine scale mapping using high-density SNP haplotype data.

Bayesian framework. Gene trees and the coalescent process. Genetic heterogeneity and shattered gene

trees. Markov chain Monte Carlo (MCMC)

algorithm. SNP genotype data. Example: cystic fibrosis.

Page 3: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Introduction

Candidate region of the order of 1Mb in length.

Refine location of putative disease locus within region.

Make use of high-density maps of single nucleotide polymorphisms (SNPs).

Type sample of affected cases and unaffected controls.

Page 4: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Once upon a time…

Disease predisposition determined by single locus in candidate region.

Each case chromosome carries a copy of a disease allele, resulting from a single recent mutation event at disease locus.

Each control chromosome carries a copy of the ancient normal allele at the disease locus.

Page 5: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Page 6: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Page 7: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Page 8: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

In an ideal world…

Excess sharing of SNP haplotypes in the vicinity of the disease locus, among cases and not among controls.

Decreased probability of sharing as distance from disease locus increases.

Approximate location of disease locus inferred.

Page 9: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Problems…

Gene tree and ancestral haplotypes are unknown.

Marker mutations lead to mismatch of alleles within preserved regions.

Multiple disease genes, multiple mutations, and dominance.

Page 10: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Example: Cystic fibrosis (CF)

Fully penetrant recessive disorder, incidence ~1/2500 live births in white populations, less common in other populations.

Preliminary linkage analysis suggested 1.8Mb candidate region for a single CF gene on chromosome 7q31.

More recently, a 3bp deletion, ΔF508, has been identified in the CFTR gene at ~0.88Mb into the candidate region.

Now known that ΔF508 accounts for ~66% of all chromosomal mutations in individuals with CF.

Remainder of CF chromosomes carry copies of many other rare mutations in the same gene.

23 RFLPs used to identify haplotypes in 92 control chromosomes and 94 case chromosomes, 62 of which have been confirmed to carry ΔF508.

Page 11: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Page 12: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Page 13: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Challenges…

The ΔF508 locus does not lie at the centre of the region of high LD.

Non-ΔF508 case chromosomes are not expected to share the same founder marker haplotype.

Useful test-data set for fine-scale mapping methods…

Page 14: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Page 15: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Challenges…

The ΔF508 locus does not lie at the centre of the region of high LD.

Non-ΔF508 case chromosomes are not expected to share the same founder marker haplotype.

Useful test-data set for fine-scale mapping methods…

Page 16: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Published methods…

Page 17: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Bayesian framework (1)

Assume disease locus exists in candidate region: aim is then to estimate its location.

Approximate the posterior distribution of location.

Allows assignment of probabilities that disease locus lies in any particular area of the candidate region.

Page 18: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Bayesian framework (2)

Aim is to approximate the posterior density of location of the disease locus, given SNP haplotypes in cases A and controls U, denoted f(x|A,U).

Depends on other model parameters M, including gene tree, population haplotype frequencies, etc…

Recover marginal posterior density by integration over these nuisance parameters,

f(x|A,U) = ∫f(x,M|A,U)dM

Page 19: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Bayesian framework (3)

By Bayes’ Theorem…

f(x,M|A,U) = C f(A,U|x,M) f(x,M)

Normalising constant. Likelihood of haplotype data given

model parameters M and location x. Prior density of M and x.

Page 20: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Bayesian framework (3)

By Bayes’ Theorem…

f(x,M|A,U) = C f(A,U|x,M) f(x,M)

Normalising constant. Likelihood of haplotype data given

model parameters M and location x. Prior density of M and x.

Page 21: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Bayesian framework (3)

By Bayes’ Theorem…

f(x,M|A,U) = C f(A,U|x,M) f(x,M)

Normalising constant. Likelihood of haplotype data given

model parameters M and location x. Prior density of M and x.

Page 22: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Bayesian framework (3)

By Bayes’ Theorem…

f(x,M|A,U) = C f(A,U|x,M) f(x,M)

Normalising constant. Likelihood of haplotype data given

model parameters M and location x. Prior density of M and x.

Page 23: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Control chromosomes

Assumed to carry an ancient normal allele at the disease locus.

Effects of recent shared ancestry of less importance, so simple model assumed:

f(A,U|x,M) = f(A|x,M) f(U|h) The likelihood, f(U|h), depends only on

population SNP haplotype frequencies, h. For many SNPs, the number of possible

haplotypes is large, so frequencies are parameterised in terms of allele frequencies and first-order LD between pairs of adjacent loci.

Page 24: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Gene trees

Representation of the recent shared ancestry of case chromosomes at the disease locus.

Star shaped tree: each case chromosome descends independently from founder. Assumes there is too much information in sample about ancestral recombination and mutation events.

Bifurcating tree: shared ancestral recombination and mutation events between chromosomes appear only once in their shared ancestry.

Page 25: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Page 26: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Page 27: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Page 28: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Page 29: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Gene trees

Representation of the recent shared ancestry of case chromosomes at the disease locus.

Star shaped tree: each case chromosome descends independently from founder. Assumes there is too much information in sample about ancestral recombination and mutation events.

Bifurcating tree: shared ancestral recombination and mutation events between chromosomes appear only once in their shared ancestry.

Page 30: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Tree specification

Topology T: the branching pattern of the tree.

Branch lengths, τ, determined by the waiting times, w, between merging events in the gene tree.

Scaled in units of 2N generations, where N is effective population size.

Leaf nodes

Root

Page 31: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Prior probability model

Uniform prior probability model for population haplotype frequencies, the location of disease locus, and the effective population size.

Each gene tree topology has equal prior probability.

Prior probability model reduces to:f(x,M) = C f(w)

Need prior probability model for waiting times between merging events.

Page 32: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

The coalescent process (1)

Time between merging event from k to k-1 lineages.

Scaled in units of 2N generations.

Exponential distribution with rate k(k-1)/2.

Page 33: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

The coalescent process (1)

Time between merging event from k to k-1 lineages.

Scaled in units of 2N generations.

Exponential distribution with rate k(k-1)/2.

Exponential: rate 8x7/2 = 28Expected time: 0.0357

Page 34: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

The coalescent process (1)

Time between merging event from k to k-1 lineages.

Scaled in units of 2N generations.

Exponential distribution with rate k(k-1)/2.

Exponential: rate 7x6/2=21Expected time: 0.0476

Page 35: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

The coalescent process (1)

Time between merging event from k to k-1 lineages.

Scaled in units of 2N generations.

Exponential distribution with rate k(k-1)/2.

Exponential: rate 2x1/2=1Expected time: 1

Page 36: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

The coalescent process (2)

Assumes constant effective population size, N.

Flexible: can allow for exponential population growth and population sub-structure.

Assumes sample is ascertained at random from the population. Problem: case chromosomes ascertained because they carry a copy of the disease mutation.

Assumes sample has single common ancestor. Problem: genetic heterogeneity.

Page 37: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

The shattered coalescent model

Generalisation of the coalescent process to allow branches of the gene tree to be removed.

Introduce indicator variable, zb, for each node, b, taking the value 1 if b has a parent in the gene tree and 0 otherwise.

Allows for singleton leaf nodes, corresponding to sporadic case chromosomes, and disconnected sub-trees, corresponding to independent mutation events at the same disease locus.

Assume number of branches of gene tree not removed in the shattered coalescent process given by binomial distribution, with shattering parameter ρ.

Page 38: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Page 39: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Page 40: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Page 41: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Ancestral haplotypes

Haplotypes, I, carried by internal nodes of the gene tree are unknown.

To calculate posterior probability, need to integrate over distribution of possible ancestral haplotypes, which depends on gene tree and other model parameters.

Treated as augmented data in Bayesian framework: enters posterior probability through likelihood…

f(x|A,U) = ∫ ∫ f(x,M,I|A,U)dMdI

and…

f(x,M,I|A,U) = C f(A,U,I|x,M) f(x,M)

Page 42: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Likelihood calculations

If node has no parent in shattered gene tree, treat as a random chromosome from the population (sporadic or founder for mutation).

If node has parent in genealogy, depends on marker haplotype carried by the parental node, and the occurrence of recombination and mutation events along the connecting branch.

Page 43: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Likelihood calculations

If node has no parent in shattered gene tree, treat as a random chromosome from the population (sporadic or founder for mutation).

If node has parent in genealogy, depends on marker haplotype carried by the parental node, and the occurrence of recombination and mutation events along the connecting branch.

Page 44: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

MCMC algorithm (1)

Need to calculate joint posterior distribution f(x,h,T,w,z,N,ρ,I|A,U).

Parameter space extremely complex, so cannot be calculated analytically.

Markov chain Monte Carlo (MCMC) algorithm approximates the posterior distribution by sampling from f(x,h,T,w,z,N,ρ,I|A,U).

Computationally intensive, but becoming more practical with improvements in computing power.

Can handle missing SNP data: treat as augmented data in the same way as ancestral haplotypes.

Page 45: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

MCMC algorithm (2)

Let S denote current set of model parameters {x,h,T,w,z,N,ρ,I}.

Propose “small” change to model parameters, S*. Accept S* in place of S with probability

f(S*|A,U)/f(S|A,U). If S* is not accepted, the current parameter S is

retained. Initial burn-in to allow convergence of f(S|A,U)

from random starting parameter set. Subsequent sampling period, parameter set

recorded every rth step of the algorithm: each recorded output represents a random draw from f(S|A,U).

Page 46: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

MCMC algorithm (3)

101 0.47374 2557.62766 4.24189612 10849.19083 0.78104 -1769.51173 102 0.40629 2112.19993 4.16846454 8804.63049 0.79777 -1788.66623 103 0.46534 1679.71719 4.30423786 7229.90233 0.75364 -1854.19049 104 0.48211 2229.24788 4.33740414 9669.14899 0.78009 -1763.70173 105 0.43808 2402.10599 4.29011844 10305.31919 0.82178 -1760.56671 106 0.44607 2275.33453 4.03331587 9177.14285 0.82601 -1775.90300 107 0.41822 3016.70273 4.39000994 13243.35496 0.77768 -1844.20629 108 0.40934 2534.50113 4.07270615 10322.27832 0.81590 -1861.97411 109 0.41032 3122.91416 4.25386813 13284.46504 0.82479 -1814.27448 110 0.45020 3209.14218 4.34316471 13937.83307 0.78422 -1801.44160

LocationN

Tree heightρ

Log posteriorprobability

Page 47: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

MCMC algorithm (3)

101 0.47374 2557.62766 4.24189612 10849.19083 0.78104 -1769.51173 102 0.40629 2112.19993 4.16846454 8804.63049 0.79777 -1788.66623 103 0.46534 1679.71719 4.30423786 7229.90233 0.75364 -1854.19049 104 0.48211 2229.24788 4.33740414 9669.14899 0.78009 -1763.70173 105 0.43808 2402.10599 4.29011844 10305.31919 0.82178 -1760.56671 106 0.44607 2275.33453 4.03331587 9177.14285 0.82601 -1775.90300 107 0.41822 3016.70273 4.39000994 13243.35496 0.77768 -1844.20629 108 0.40934 2534.50113 4.07270615 10322.27832 0.81590 -1861.97411 109 0.41032 3122.91416 4.25386813 13284.46504 0.82479 -1814.27448 110 0.45020 3209.14218 4.34316471 13937.83307 0.78422 -1801.44160

LocationN

Tree heightρ

Log posteriorprobability

Page 48: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Cystic fibrosis: revisited

Assume a fixed recombination rate of 0.5cM per Mb and a marker mutation rate of 2.5 x 10-5 per locus, per generation.

Each run of MCMC algorithm begins with 20,000 step burn-in period: thrown away.

Subsequent 200,000 step sampling period, output recorded every 50th step of the algorithm: 4000 outputs.

Two analyses of CF data performed: control chromosomes (92) and (i) ΔF508 case chromosomes (62) only; (ii) all case chromosomes (94).

Page 49: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Page 50: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Cystic fibrosis: summary statistics

Parameter ΔF508 subset All cases

Location x (Mb)

0.8640.654-1.040

0.8510.650-1.003

Shattering parameter ρ

0.9350.857-0.985

0.8290.746-0.892

Time to MRCA (generations)

595183-1877

824246-3257

Page 51: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Cystic fibrosis: genetic heterogeneity

Structure of shattered gene tree provides information about genetic heterogeneity at disease locus.

For each output of MCMC algorithm, record shattered gene tree.

For each pair of chromosomes, record whether they appear in the same sub-tree.

Over all outputs, estimate probability that each pair of chromosomes carry the same allele at the disease locus.

Cluster chromosomes according to these probabilities: cladogram to represent genetic heterogeneity.

Page 52: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Page 53: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Page 54: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

SNP genotype data

SNP haplotype rarely available. Could infer haplotypes from SNP genotype data:

PHASE, SNPHAP, HAPLOTYPER algorithms. Better to treat haplotypes as augmented data in

Bayesian framework…

f(x|G) = ∫ ∫ ∫ ∫ f(x,M,I,A,U|G)dMdIdAdU

and…

f(x,M,I,A,U|G) = C f(A,U,I|x,M) f(x,M)

Page 55: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Cystic fibrosis: revisited – again!

Create genotype data from original CF haplotype data.

Pair together case chromosmes at random.

Pair together control chromosomes at random.

Total sample: 46 controls and 47 cases.

Page 56: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Page 57: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Cystic fibrosis: genotypes v haplotypes

Parameter Genotypes Haplotypes

Location x (Mb)

0.8550.625-1.137

0.8510.650-1.003

Shattering parameter ρ

0.8420.771-0.901

0.8290.746-0.892

Effective population size N

375107-871

846367-1657

Page 58: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Limitations

Computationally intensive – limited to sample sizes ~100 cases and controls with up to 20 SNPs.

Alternative approach: do not model gene tree explicitly – estimate shattered gene tree using standard clustering methods.

Page 59: FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Summary

High density SNP map of the human genome now available.

Fine scale mapping of disease loci requires effective modelling of shared ancestry of sample of case and control chromosomes.

Methods exist for haplotype and genotype data: MCMC algorithms are very computationally intensive and are currently limited to relatively small sample sizes.

Further development is necessary…


Recommended