+ All Categories
Home > Documents > (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf ·...

(Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf ·...

Date post: 21-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
52
(Genome-wide) association analysis Peter M. Visscher Queensland Institute of Medical Research Brisbane, Australia [email protected] 1
Transcript
Page 1: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

(Genome-wide) association analysis

Peter M. VisscherQueensland Institute of Medical

ResearchBrisbane, Australia

[email protected]

1

Page 2: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Outline

• Association vs linkage• Linkage disequilibrium• Analysis: single SNP

• GWAS: design, power• GWAS: analysis

2

Page 3: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Linkage Association

Families Populations3

Page 4: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Linkage disequilibrium around an ancestral mutation

[Ardlie et al. 2002]4

Page 5: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

m = functional mutation

[Cardon & Bell 2001]

Linkage and association

5

Page 6: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Why mapping by LD?

• Use association across families– no pedigree information needed

• Higher resolution of mapping• Dense SNP maps: GWAS

6

Page 7: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Linkage vs. AssociationLinkage Association/LD

genealogy known unknownmarker sharing by descent by state# meioses small largeshared DNA segments large smallmarkers microsatellites SNPmarker density low high

7

Page 8: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

LD

• Non-random association between alleles at different loci

• Many possible causes– mutation– drift / inbreeding / founder effects– population stratification– selection

• Broken down by recombination

8

Page 9: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Measures of pair-wise LD

Definition SymbolCovariance DScaled covariance D’Association ρCorrelation rFrequency difference fDelta δYule y

[Morton et al. 2001, PNAS]9

Page 10: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Measures

• All estimates of LD are functions of pairwise haplotype frequencies

• ‘Best’ measure depends on purpose of LD estimation

10

Page 11: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Properties of ‘good’ measures

• Simple biological interpretation• Allow statistical tests• Directly related to evolutionary forces

(recombination, selection, drift, etc.)• Standardised to allow comparisons across

loci & populations

[Hedrick 1987]11

Page 12: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Utility of LD measures

• Population dynamics• Estimating population size• Gene/QTL mapping

12

Page 13: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Definition of D

• 2 bi-allelic loci– Locus 1, alleles A & a, with freq. p and (1-p)– Locus 2, alleles B & b with freq. q and (1-q)– Haplotype frequencies pAB, pAb, paB, pBB

D = pAB - pq

13

Page 14: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Alternative expressions for D

D = pAB – pq = DAB

= pab – (1-p)(1-q) = Dab

= -(pAb – p(1-q)) = -DAb

= -(paB – (1-p)q) = -DaB

= pABpab - pAbpaB

14

Page 15: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Related measures

Dmax = smaller of pq and (1-p)(1-q) [D<0]= smaller of p(1-q) and (1-p)q) [D>0]

• D’ = D / Dmax

-1 ≤ D’ ≤ 1Can compare pairs of loci across populations

15

Page 16: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

|D’|

• |D’| = |D| / Dmax

0 ≤ D’ ≤ 1– Can compare different pairs of loci in genome|D’| = 1 if one of the 4 haplotype frequencies = 0– E(|D’|) and var(|D’|) not known

16

Page 17: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

r2

r2 = D2 / [pq(1-p)(1-q)]

• Squared correlation between presence and absence of the alleles in the population

• ‘Nice’ statistical properties

[Hill and Robertson 1968]17

Page 18: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Properties of r and r2

• Population in ‘equilibrium’E(r) = 0E(r2) = var(r) ≈ 1/[1 + 4Nc] + 1/n

N = effective population sizen = sample size (haplotypes)c = recombination rate

• nr2 ~ χ(1)2

[Sved 1971; Weir and Hill 1980]18

Page 19: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Measures of LD are very variable!

• If nr2 ~ χ2(1 df), then

[CV(r2)]2 ≈ 2

CV = σ(r2) / E(r2) = (2/n2)0.5 / (1/n) = (2)0.5

19

Page 20: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

r2 decay under different scenarios.(Pritchard & Przeworski,2001)

y-axis: (r2)0.5

20

Page 21: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

r2 decay in real data.(Pritchard & Przeworski,2001)

21

Page 22: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Population stratification

Allele frequency Haplotype frequency pA1 pB1 pA1B1 pA1B2 pA2B1 pA2B2 Pop. 1 0.9 0.9 0.81 0.09 0.09 0.01 Pop. 2 0.1 0.1 0.01 0.09 0.09 0.81 Average 0.5 0.5 0.41 0.09 0.09 0.41

Both populations are in linkage equilibrium

Combined population: D = 0.16 and D’ = 0.64r2 = 0.4096

22

Page 23: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

23

Page 24: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Decay of LD (bi-allelic loci)

D(0) = pAB - pqpAB(1) = (1-c) pAB(0) + cpqD(1) = (1-c) pAB(0) + cpq – pq

= (1-c)D(0)

D(t) = (1-c)t D(0)≈ e-ct D(0)

24

Page 25: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Decay of LD

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 10 20 30 40 50 60 70 80 90 100Generation

LD

c = 0.10c = 0.01c = 0.001

25

Page 26: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Approaches for quantitative traits

• Association (single locus)• TDT• GWAS

26

Page 27: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Association

• Random sample from the population• Associate allelic or haplotype variant(s)

with trait values• H0: no association• Analysis

– Linear (mixed) model– Allele (haplotype) as fixed effect

27

Page 28: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Falconer model for single biallelic QTL

Var (X) = Regression Variance + Residual Variance= Additive Variance + Dominance Variance

bb Bb BB

m

-a

ad

28

Page 29: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

29

Page 30: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

TDT= transmission disequilibrium test

• Association with family-based controls• Original TDT (disease mapping)

– trios, two parents one affected progeny– test for transmission of allele to affected progeny from

heterozygous parents• non-transmitted allele is the control

• Quantitative traits– Test for association between trait value and allele

within parental mating type

30

Page 31: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Classical TDTTable 4.1: Transmission data for a bi-allelic marker.

Not transmitted

Transmitted Allele 1 Allele 2 Total

Allele 1 n11 n12 n11 + n12

Allele 2 n21 n22 n21 + n22

Total n11 + n21 n12 + n22 n

The TDT statistic is,

TDT = (n21 – n12)2 / (n21 + n12)

~ χ12 Test for both linkage and association

31

Page 32: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Quantitative traits

• With family data we can separate association into between and within family components

• Advantage– Within component is robust to stratification

• Disadvantage– Unrelated design is more powerful for same

sample size32

Page 33: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

TDT for quantitative traits(regression model)

33

Page 34: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Information for Within Test

• Families are only informative for the within family component when the offspring can have different genotypes....– AA x AA ?– AA x aa ?– AA x Aa ?– Aa x Aa ?

• At least one parent must be heterozygous34

Page 35: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Population Stratification

• When there is no population stratification the slopes for the between and within test should be equal

35

Page 36: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Power (bi-allelic locus)

q2 = {2p(1-p)[a + d(1-2p)]2 + [2p(1-p)d]2 } / σp2

ANOVA Regression

Fit 3 genotypes y = µ + βx + e (x = 0, 1, 2)2 df 1 df

λANOVA = nq2/(1-q2) n= [(1-q2)/(q2)](z(1-α/2) + z(1-β))2

36

Page 37: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Genetic Power Calculator (PGC)http://pngu.mgh.harvard.edu/~purcell/gpc/

37

Page 38: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Power (n=1000)

38

Page 39: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

GWAS

• Same principle as single locus association, but additional information– QC

• Duplications, sample swaps, contamination– Power of multi-locus data

• Unbiased genome-wide association• Relatedness• Population structure• Ancestry

39

Page 40: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Detection of susceptibility variants for common diseases

CNV studies

Sequencing

Not yet detectable

Atypical of common diseases

Association

High

Intermediate

Modest

Low

Very rare Rare Uncommon Common

Allele frequency

Penetrance

0.001 0.01 0.1

Linkage

Requires same locus in many families orlarge pedigrees.

Modified from McCarthy et al.

40

Page 41: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Detection of susceptibility variants for common diseases: the new era

CNV studies

Sequencing

Not yet detectable

Atypical of common diseases

Variants typically identified by GWAS

High

Intermediate

Modest

Low

Very rare Rare Uncommon Common

Allele frequency

Penetrance

0.001 0.01 0.1

CNV/sequencing

41

Page 42: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Sequence - SNPs

LD

Technology

42

Page 43: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

GWAS in humans

Advances ingenotypingtechnology

Sample collections ofadequate size

Better understandingof patterns of human

sequence variation

Genome-wideassociation

scans

3,000,000,000 basesin human genome

~10,000,000 positionscommonly variant

in Europeans

80% of these capturedby typing ~500k

Samplesof

interest

test for evidence ofassociation

43

Page 44: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

• Categorical traits– disease susceptibility genes

• Continuous traits– quantitative trait loci, QTL

44

Page 45: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Age-related macular degeneration

45

Page 46: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

GWAS analysis

Challenges most obviously, multiple testing burden computation

Opportunities simple methods can work well with ↑ data novel analyses permitted

46

Page 47: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

The multiple testing burden

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35 40 45 50

Number of independent tests performed

P(at

leas

t 1 fa

lse

posi

tive) per test false positive

rate 0.05

per test false positive rate0.001 = 0.05/50

47

Page 48: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Genomic control

Test locus Unlinked ‘null’ markers

( )2χE

χ2 No stratification

( )2χE

χ2

Stratification → adjust test statistic48

Page 49: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Genomic control

Simple estimate of inflation factor

median protects from outliers• i.e. true effects

bounded at minimum of 1• i.e. should never increase test statistic

extends to multiple alleles, haplotpes, quantitative traits, different tests, etc

456.0/},,,{ˆ 222

21 Nmedian χχχλ =

49

Page 50: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Empirical assessment of ancestry

~2K SNPs

CEPH/EuropeanYorubaHan ChineseJapanese

51

Page 51: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Entire Phase I HapMap

Empirical assessment of ancestry

52

Page 52: (Genome-wide) association analysisnitro.biosci.arizona.edu/workshops/GIGA/pdfs/L3-GWAS.pdf · Association • Random sample from the population • Associate allelic or haplotype

Han ChineseJapanese

~10K SNPs

Empirical assessment of ancestry

53


Recommended