Rare variant analysis in large-scale association and sequencing studies
Eleftheria [email protected]
Missing heritability in complex traits Interactions
Structural variation
Epigenetics and environment
Thousands of very small effects
Large phenotype-genotype heterogeneity
Locus heterogeneity and rare variants
Low frequency and rare variants Low frequency (0.01<MAF<0.05) and rare variation
(MAF<0.01) can contribute to complex common phenotypes
Rare variants can have higher penetrance, contribute to more extreme phenotypes and may be more useful as predictive markers
Accessing low frequency and rare variants through:– GWAS– imputation– re-sequencing
Rare variant analysis Single-point analysis of rare variants is under-powered
Approximate sample sizes (cases+controls, equally sized) required to attain 80% power to detect an allelic OR=2.0 at α=5×10−8 dramatically increases as MAF decreases:
An alternative is to use multivariate methods to combine information across multiple variant sites
Several locus-specific approaches have been proposed– collapsing methods– allele-matching methods
MAF Sample size0.05 2,5000.01 12,000
0.001 117,000
Rare variant analysis methods: challenges
ImputationGenotype-associated probabilities
ResequencingGenotype call uncertaintyFalse positive rate
Probability that a variant be functional
Family-based designs
Extreme distribution ends designs
Incorporating multiple covariates
Correlation structure
Direction of effect
Meta-analysis
Collapsing methods
pi
0.2
0.1
0.0
0.2
iii
ii m
ry βx
ARIEL: Accumulation of Rare variants Integrated and Extended Locus-specific test
Allele-matching methods
Extended to account for uncertainty: AMELIA
(Allele-Matching Empirical Locus Integrated Association test)
cases
controls
Compare similarity scores between cases and controls at each SNP, then sum over SNPs: KBAT
2 4 4 4 4 2
2 4 0 4 4 4
Mukhopadhyay et al, Gen Epi 2009
Power comparison
1000 replications, d=0.02, Q=0.05,
non-consensus SNP quality scores, 1000 cases/1000 controls,
causal variants are of high quality (phred score 10; probability of
correct base-call 0.90)
• in the presence of different directions of effect allele-matching methods are much more powerful than collapsing methods
• accounting for uncertainty increases power
Power comparisons using 500 cases/500 controls and 1000 cases/1000 controls, when causal variants are of high quality (phred score 10; probability of correct base-call 0.90)
• the power of the allele-matching methods further increase over the collapsing methods with increasing sample size
• accounting for uncertainty increases power
Population isolates• The study of rare variants can be empowered by
focusing on isolated populations, in which rare variants may have increased in frequency and linkage disequilibrium tends to be extended
• Need deeply-phenotyped isolated population samples
• Whole-genome sequencing in a subset of samples and imputation out into the full set of GWASed samples
• Association with traits of interest
Analysis of rare variants in 1000 genomes-imputed data
Osteoarthritis• Osteoarthritis (OA) is characterised by cartilage degeneration in
synovial joints leading to pain and loss of function particularly in the hip and the knee
• OA is a common complex disease with environmental and genetic components affecting 40% of people over the age of 70 years
• Current treatments: analgesics, total joint replacement (TJR)
• To date only two loci have been robustly associated with OA
• Common variants (>0.20 MAF) small effect sizes(OR~1.15)
Directly typed SNPs (Illumina 610k) Imputed SNPs: HapMap
Imputed SNPs: 1000 genomes
3,177 cases4,854 controls
Directly-typed
Directly-typed
HapMap-basedImputation
Directly-typed
HapMap-basedImputation
1KGP-basedImputation
Study NumberCases
NumberControls
EffectAllele
MAF OR(95% CIa)
P value
arcOGEN GWAS 3177 4894 A 0.0718 1.32(1.16-1.50)
1.67x10-5
arcOGEN replication set 1
5165 6155 A 0.0694 1.17(1.06-1.30)
2.60x10-3
GOAL 1686 743 A 0.0720 1.23(0.99-1.56)
7.20x10-2
arcOGEN replication set 2
2409 2319 A 0.0636 1.16(0.98-1.37)
7.86x10-2
deCODE 1552 3071 A 0.0917 1.03(0.88-1.20)
7.31x10-1
EGCUT 2617 2619 A 0.0769 1.16(1.01-1.34)
4.01x10-2
RSI 1950 3243 G 0.0608 1.01(0.86-1.20)
8.61x10-1
RSII 485 1460 A 0.0715 1.46(1.07-2.00)
1.68x10-2
Meta-analysis 19041 24504 A 1.17(1.11-1.23)
2.07x10-8
Intron 4 of the guanine nucleotide exchange factor-encoding gene MCF2L
Mcf2l studies in rat models of OA have shown expression in articular chondrocytes
In human cells MCF2L regulates neurotrophin-3 induced cell migration in Schwann cells. Neurotrophin-3 is a member of the nerve growth factor (NGF) family, and inhibition of NGF has an effect on the pain experienced by OA patients
PE sequencingPE library preparation
Long-range PCR
Pulldown
Data processing and statistical analysis
Analysis of rare variants in sequence data
Targeted resequencingWhole-genome and whole-exome resequencing
500 Exomes Project– Collaborative exome resequencing experiment
between the Sanger Institute, GSK and Lausanne University
– Study design:– 500 individuals from the CoLaus cohort with BMI>25– 250 with type 2 diabetes and 250 normoglycaemic matched controls
– Affymetrix 500k GWAS data
– Exome sequencing
– Mean depth ~65x
500 Exomes Project –preliminary dataNumber of cases 195Number of controls 166Number of transcripts analyzed 14,924
Single-point ARIEL AMELIA
UK10K projectRare genetic variants in health and disease
4,000 whole genomes: population-based cohorts with rich phenotype data6,000 whole exomes: obesity, neurodevelopmental disorders and further rare diseases
Aims• Elucidate singleton variants by maximising variation detected• Directly associate genetic variations to phenotypic traits• Uncover rare variants contributing to disease• Assign uncovered variations into genotyped cohort and case/control collections• Provide a sequence variation resource for future studies
www.uk10k.org
Andrew MorrisJenn Asimit
Reedik Magi
Acknowledgements
Acknowledgements
A.G. Day-Williams, L. Southam, K. Panoutsopoulou, N.W. Rayner, T. Esko, K. Estrada, H.T. Helgadottir, A. Hofman, T. Ingvarsson, H. Jonsson, A. Keis, H.J.M. Kerkhof, G. Thorleifsson, N.K. Arden, A. Carr, K. Chapman, P. Deloukas, J. Loughlin, A. McCaskie, W.E.R. Ollier, S.H. Ralston, T.D. Spector, G.A. Wallis, J.M. Wilkinson, N. Aslam, F. Birell, I. Carluke, J. Joseph, A. Rai, M. Reed, K. Walker, S.A. Doherty, I. Jonsdottir, R.A. Maciewicz, K.R. Muir, A. Metspalu, F. Rivadeneira, K. Stefansson, U. Styrkarsodottir , A.G. Uitterlinden, J.B.J. van Meurs, W. Zhang, A.M. Valdes, M. Doherty, arcOGEN Consortium
500 Exomes ProjectA partnership between the Wellcome Trust Sanger Institute, the CoLaus principal investigators and the Quantitative Sciences dept. of GlaxoSmithKline
GSK: Lausanne:Vincent Mooser Peter Vollenweider John Whittaker Gerard WaeberLinda McCarthy Jacques BeckmannMatt Nelson Sven BergmannClaudio Verzilli Pedro Marques VidalJudong Shen Murielle BochudStephanie Chissoe Zoltan KutalikCharles CoxMeg EhmKeith NangleDana FraserKijoung SongPeter WoollardDawn Waterworth
Wellcome Trust Sanger Institute:Jennifer AsimitInes BarrosoCaren BrockingtonYuan ChenAaron Day-WilliamsRichard DurbinMartin HuntSarah HuntMatt HurlesJimmy LiuMargarida LopesDaniel MacArthurAarno PalotieTheo PapamarkouFliss PayneManj SandhuCarol ScottLorraine SouthamIoanna TachmazidouChris Tyler-SmithEllie WheelerBendik WinsvoldYali XueEleftheria Zeggini
Principal ApplicantsLeena Peltonen, Wellcome Trust Sanger Institute Richard Durbin, Wellcome Trust Sanger Institute
Co-applicantsJeffrey Barrett, Wellcome Trust Sanger Institute Ines Barroso, Wellcome Trust Sanger Institute George Davey-Smith, University of Bristol Ismaa Sadaf Farooqi, University of Cambridge Matthew Hurles, Wellcome Trust Sanger Institute Stephen O'Rahilly, University of Cambridge Aarno Palotie, Wellcome Trust Sanger Institute Nicole Soranzo, Wellcome Trust Sanger Institute Tim Spector, King's College London Eleftheria Zeggini, Wellcome Trust Sanger Institute
Named collaboratorsPhil Beales, University College London Jamie Bentham, University of Oxford Shoumo Bhattacharya, University of Oxford Patrick Bolton, King's College London Gerome Breen, King's College London Krishnan Chatterjee, University of Cambridge Laura K Curran, King's College London Anne Farmer, King's College London David Fitzpatrick, Edinburgh University Daniel Geschwind, UCLA, USA Steve Humphries, University College London Jouko Lonnqvist, National Public Health Institute, Finland Peter McGuffin, King's College London Lucy Raymond, University of Cambridge David Savage, University of Cambridge Peter Scambler, University College London Robert Semple, University of Cambridge David St Clair, University of Aberdeen Lennart von Wendt, University of Helsinki, Finland
Supported by the Wellcome Trust, Arthritis Research UK, Pfizer