Date post: | 13-Jan-2016 |
Category: |
Documents |
Upload: | marianna-green |
View: | 215 times |
Download: | 0 times |
Recombination, and haplotype structure
Simon Myers, Gil McVean
Department of Statistics, Oxford
The starting point
• We have a genome’s worth of data on genetic variation
• We wish to understand why the haplotype structure looks how it does– Differences between regions, populations
Where do haplotypes come from?
• In the absence of recombination, the most natural way to think about haplotypes is in terms of the genealogical tree representing the history of the chromosomes
• Tree affects mutation patterns
• Mutation patterns give information on tree
What determines the shape of the tree?
Present day
Ancestry of current population
Present day
Ancestry of sample
Present day
The coalescent: a model of genealogies
time
coalescenceMost recent common ancestor (MRCA)
Ancestral lineages
Present day
Simulating histories with the coalescent
Simulating data with the coalescent
Haplotype structure in the absence of recombination
• In the absence of recombination, the shape of the tree and where mutations fall on it determine patterns of haplotype structure
• Two mutations on the same branch will be in complete association, mutations on different branches will have lower and often low association
r2 = 1 r2 = 0.04
Haplotypes when there is recombination
• When there is no recombination, haplotype structure reflects the age distribution of mutations and the shape of the underlying tree
• When there is some recombination, every nucleotide position has a tree, but the tree changes along the chromosome at a rate determined by the local recombination landscape
• By using SNP information to inform us about the trees, we can learn about how quickly the trees changes
– This relates to the recombination rate
A bit of recombination ‘shuffles’ genetic variation
Lots of recombination does lots of shuffling
Recombination and haplotype diversity
• Without recombination, a new mutation can create at most one new haplotype
– Any two mutations delineate at most 3 haplotypes in total (ancestral, plus two new types)
• With recombination, this mutation can spread onto every existing haplotype background, creating the potential for more haplotypes
• For a given number of SNPs a region with recombination will tend to have (in comparison to a region with no recombination)
– More haplotypes– Less variance in the pairwise differences between haplotypes– Less skewed haplotype frequencies
The ancestral recombination graph
• The combined history of recombination, mutation and coalescence is described by the ancestral recombination graph
Mutation
Mutation
Event
Recombination
Coalescence
Coalescence
Coalescence
Coalescence
In humans, recombination is not uniformly distributed
• Most recombination occurs in recombination hotspots – short (1-2kb) regions every 50-100kb that occupy at most 3% of the genome but probably account for 90% or more of the recombination
• This means that haplotype structure in humans is an interesting hybrid between the no recombination and lots of recombination situations
Learning about recombination
• Just like there is a true genealogy underlying a sample of sequences without recombination, there is a true ARG underlying samples of sequences with recombination
• We can consider nonparametric and parametric ways of learning about recombination
• There are useful nonparametric ways of learning about recombination which we will consider first
– These really only apply to species, such as humans, where we can be fairly sure that most SNPs are the result of a single ancestral mutation event
The signal of recombination?
Recurrent mutation Recombination
Ancestral chromosome recombines
Detecting recombination from DNA sequence data
• Look for all pairs of “incompatible” sites
• Find minimum number of intervals in which recombination events must have occured (Hudson and Kaplan 1985): Rm
Improving the detection algorithm
• Rm greatly underestimates the amount of recombination in the history of a set of sequences
• Myers and Griffiths (2003) developed an improved way of detecting recombination events
– Without recombination, every new mutation can create only a single new haplotype– With recombination, mutations can be shuffled between haplotype background, generating
haplotype diversity – Each recombination makes at most one new haplotype– If I see H haplotypes with S segregating sites, at least H-S-1 recombination events must
have occurred
• This offers potential to identify many more recombination events– Carefully combine bounds from different collection of sites– Dynamic programming algorithm makes computation extremely fast– Better (sometimes slower) algorithms developed recently
Tree-pairs where we cannotsee recombination events
A tree-pair where we couldsee recombination events, but don’t
Problems with ‘counting’ recombination events
Modelling recombination
• Model-based approaches to learning about recombination allow us to ask more detailed questions than nonparametric approaches
– What is the rate of recombination (as opposed to just the number of events)
– Does gene A have a higher recombination rate than gene B?
– Is the rate of recombination across a region constant?
– Where are the recombination hotspots?
• We can use coalescent model approaches (approximations) to calculating the likelihood of arbitrary recombination maps given observed data
Fitting a variable recombination rate
• Use a reversible-jump MCMC approach (Green 1995)
Split blocks
Merge blocks
Change block size
Change block rate
Cold
Hot
SNP positions
( ) ( ) ( , ) ( , )( , ) min 1,
( ) ( ) ( , ) ( , )C
C
q u
q u
Composite likelihood ratio
Hastings ratio
Ratio of priors
Jacobian of partial derivatives relating changes in dimension to sampled random numbers
Acceptance rates
• Include a prior on the number of change points that encourages smoothing
Strong concordance between fine-scale rate estimates from sperm and genetic variation
Rates estimated from sperm Jeffreys et al (2001)
Rates estimated from genetic variationMcVean et al (2004)
Inferring hotspots
• We perform a statistical test for hotspot presence
• Based on an approximation to the coalescent similar to that used for rate estimation
• All previously identified hotspots are 1-2kb in size– At a position in genome, consider where 2kb hotspot might be present– Fit a model with hotspot– Fit one without– Compare in terms of (approximate) likelihood ratio test– Evaluate significance via simulation– When p-value below threshold, declare a hotspot
Rates and hotspots across the human genome
From Myers et al. (2005)Hotspots throughout human genome (35,000 identified)
Applications of recombination approaches to real data
• Rates and hotspots across the human genome (Myers et al. 2005)– Previously, no understanding of why hotspots localise where they do– Can 35,000 hotspots, accounting for >50% of human recombination, help?
• Comparison of recombination rates (Winckler et al. 2004, Ptak et al. 2005)– Between humans and chimpanzees– At individual recombination hotspots
• Understanding genomic rearrangements (Myers et al., submitted!)– Cause a number of “genomic disorders”– Relationship to recombination hotspots
32,996 Phase II HapMap hotspots
THE1B (LTR of retrotransposon)
Estimated 50-70% of all human recombinationHotspots on all chromosomes, including X
THE1B: Found in 1196 hotspots versus 606 coldspots (p<<10-20) AluY: Found in 3635 hotspots versus 3262 coldspots (p=7x10-5)
~20,000 hotspots localised to within 5kb
CCTCCCTAGCCAC
CCNCCNTNNCCNC
CCTCCCCNNCCAT
THE1 consensus:...CTTCCGCCATGATTGTGAGGCCTCCCCAGCCATGTGGAACTGTGAGTCCATT...
(n=165)
(n=263)
(n=10,690)
CCGCCTTGGCCTC
CCNCCNTNNCCNC
CCGCCTCNNCCTC
AluY, AluSc, AluSg consensus:...CTCCTGACCTCGTGATCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAG...
(n=14,028)
(n=15,706)
(n=55,916)
CCTCCCTGACCAC
CCNCCNTNNCCNC
CTTCCCTNNCCAC
L2 consensus:...TGTCACCTCCTCAGAGAGGCCTTCCCTGACCACCCTATCTAAAATWGCACACC...
(n=157)
(n=6,901)
(n=1,211)
~3-4% of hotspots
~3-4% of hotspots, including DNA3
~3-4% of hotspots
Human hotspot motifs
• In humans, specific words produce recombination hotspot activity
• Hotspot motif CCTCCCTNNCCAC (p<10-33)– Raises probability of a hotspot across genetic backgrounds
– Degenerate versions CCNCCNTNNCCNC and truncated CCTCCCT also raise probability, to lesser extent
– Motif explains ~40% of human hotspots
– Operates in both sexes
– We don’t know, very clearly, which hotspots
– On THE1 background, hotspot 70-80% of time!
• Biology not clearly understood
• We identified a second, different hotspot motif (the best 9bp motif), CCCCACCCC, also by comparison of hot and cold regions of the genome
Variation in individual hotspots
Sequence variation affects recombination at DNA2 (Jeffreys and Neumann, Nature Genetics 2002)
SNPs disrupting hotspots disrupt motifs!
• DNA2:
• NID1:
AAAAGACAGCCTCCCTGTTGCTGC
AAAAGACAGCCCCCCTGTTGCTGC
CACCCCCCACCCCACCCCAACATA
CACCTCCCACCCCACCCCAACATA
Hot
Cold
Hot
Cold
Jeffreys and Neumann (Nature Genetics 2002, Hum Mol. Evol. 2005)
SNPs disrupting hotspots disrupt motifs
AAAAGACAGCCTCCCTGTTGCTGC
AAAAGACAGCCCCCCTGTTGCTGC
CACCCCCCACCCCACCCCAACATA
CACCTCCCACCCCACCCCAACATA
Disruption of CCTCCCT, best 7bp motif
Disruption of CCCCACCCC, best 9bp motif
Hot
Cold
Hot
Cold
Jeffreys and Neumann (Nature Genetics 2002, Hum Mol. Evol. 2005)
• DNA2:
• NID1:
• The 1kb deletion hotspot contains 25 repeats of CCTCCCTNNCCAC• Highest motif density in any LCR in entire genome• Strongly implicates motif in producing hotspot• Points to a link between deletion-causing and “normal” recombination
Role of motif in X-linked ichthyosis
VCX21/5000 births
Deletion breakpoint hotspot (Van Esch et al. 2005)
A more general link?
• Many other diseases are caused by recombination-mediated deletions and duplications (NAHR) – Smith-Magenis syndrome (hotspot)– CMT1A (hotspot)– NF1 microdeletion syndrome (hotspot)– DiGeorge syndrome….
• Two recent studies suggest normal hotspots and hotspots of disease-causing deletion may coincide
– de Raedt, Stephens et al. (Nature Genetics, 2006)– Two NF1 deletion hotspots both likely to coincide with crossover hotspots
– Lindsay et al. (ASHG, 2006)– CMT1A deletion hotspot associated with crossover hotspot
Other “major” NAHR hotspots
p=0.0006
CCNCCNTNNCCNC overrepresented in
hotspots
Evolution of recombination – human vs. chimps
Human
Chimp
No significant correlation in hotspots positions between species (Winckler et al. Science 2005, Ptak et al. Nature Genetics 2005)
LDhat rate estimates
LDhot hotspots
Reading
• Haplotype structure and recombination– The International HapMap Consortium: A haplotype map of the human
genome. Nature 2005, 437:1299-1320.– McVean G, Spencer CCA, Chaix R: Perspectives on human genetic variation
from the International HapMap Project. PLoS Genetics 2005, 1:e54.– Myers S, Bottolo L, Freeman C, McVean G, Donnelly P: A fine-scale map of
recombination rates and recombination hotspots in the human genome. Science 2005, 310:321–-324.
• The coalescent– Nordborg M: Coalescent Theory. In The Handbook of Statistical Genetics
(eds Balding, Bishop and Cannings), 2001. Wiley & Sons.– Hudson RR: Gene genealogies and the coalescent process. In Oxford
Surveys in Evolutionary Biology (eds Futuyama and Antonovics) 1990, 7:1–44. Oxford University Press.
Selected references
- Jeffreys, A.J., L. Kauppi, and R. Neumann. 2001. Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat Genet 29: 217-222.- Jeffreys, A.J. and R. Neumann. 2002. Reciprocal crossover asymmetry and meiotic drive in a human recombination hot spot. Nat Genet 31: 267-271.- Jeffreys, A.J. and R. Neumann. 2005. Factors influencing recombination frequency and distribution in a human meiotic crossover hotspot. Hum Mol Genet 14: 2277-2287.- Myers, S., L. Bottolo, C. Freeman, G. McVean, and P. Donnelly. 2005. A fine-scale map of recombination rates and hotspots across the human genome. Science 310: 321-324.- Ptak, S.E., D.A. Hinds, K. Koehler, B. Nickel, N. Patil, D.G. Ballinger, M. Przeworski, K.A. Frazer, and S. Paabo. 2005. Fine-scale recombination patterns differ between chimpanzees and humans. Nat Genet 37: 429-434.- The International HapMap Consortium. 2005. A haplotype map of the human genome. Nature 437: 1299-1320.- The International HapMap Consortium. 2007. The Phase II HapMap. Nature- Winckler, W., S.R. Myers, D.J. Richter, R.C. Onofrio, G.J. McDonald, R.E. Bontrop, G.A. McVean, S.B. Gabriel, D. Reich, P. Donnelly et al. 2005. Comparison of fine-scale recombination rates in humans and chimpanzees. Science 308: 107-111.