Gene-gene and gene-environment interactions Manuel Ferreira Massachusetts General Hospital Harvard...

Post on 19-Dec-2015

215 views 1 download

Tags:

transcript

Gene-gene and gene-environment

interactions

Manuel Ferreira

Massachusetts General Hospital

Harvard Medical School

Center for Human Genetic Research

Slides can be found at:

http://pngu.mgh.harvard.edu/~mferreira/

Outline

2. What is epistasis?

3. Study designs and tests to detect epistasis

4. Application to genome-wide datasets

1. G-G and G-E interactions in the context of gene mapping

1. G-G and G-E in context

chromosome 4 DNA sequenceSNP (single nucleotide polymorphism)

…GGCGGTGTTCCGGGCCATCACCATTGCGGGCCGGATCAACTGCCCTGTGTACATCACCAAGGTCATGAGCAAGAGTGCAGCCGACATCATCGCTCTGGCCAGGAAGAAAGGGCCCCTAGTTTTTGGAGAGCCCATTGCCGCCAGCCTGGGGACCGATGGCACCCATTACTGGAGCAAGAACTGGGCCAAGGCTGCGGCGTTCGTGACTTCCCCTCCCCTGAGCCCGGACCCTACCACGCCCGACTA…

Find disease-causing variation

The Human Genome

?

Gen

e ef

fect

Environmental effect

The environment modifies the effect of a gene

A gene modifies the effect of an environment

G x E interactionG x E interaction

Gene-environment interaction

S.Purcell ©

Epistasis

Gene effect

Gen

e ef

fect

Epistasis: one gene modifies the effect of another

Gene Gene ×× gene interaction gene interaction

S.Purcell ©

2. Definition(s) of epistasis

AA Aa aa

BB

Bb

bb

Epistasis or not ?

1 1 3

2 2 4

3 3 5

Definitions of epistasisBiological Statistical

Individual-levelphenomenon

Population-level phenomenon

BB Bb bb

AA

Aa

aa

S.Purcell ©

Gene RED Pigment 1

Pig

me

nt

2

?

Final pigment

Gene YELLOW

Gene RED Pigment 1

Pig

me

nt

2

Final pigment

Gene YELLOW

AA Aa aa

BB

Bb

bb

Gene RED

Gene YELLOW

Pigment 1

Pig

me

nt

2

Final pigment

X

Aa aa

BB

Bb

bb

Bateson (1909)

Gene RED

Gene YELLOW

Pigment 1

Pig

me

nt

2

Final pigment

X

AA

BB

Bb

bb

Bateson (1909)

Gene RED

Gene YELLOW

Pigment 1

Pig

me

nt

2

Final pigment

Introduced the concept of epistasis as a “masking effect”, whereby a variant or allele at one locus prevents the variant at another locus from manifesting its effect.

AA Aa aa

BB

Bb

bb

Mendelian concept, closer to biological definition of interaction between 2 molecules

Bateson (1909)

Fisher (1918)

0 2 21 3 3

1 3 3

01

1

Gene RED

Gene YELLOW

Epistasis defined as the extent to which the joint contribution of two alleles in different loci towards a phenotype deviates from that expected under a purely additive model.

AA Aa aa

BB

Bb

bb

0 2 2

0 2 21 2 2

1 2 2

AA Aa aa

0 2 2

Expected Observed

Mathematical concept, closer to statistical definition of interaction between 2 variables on a linear scale.

Dominance is defined as the extent to which the joint contribution of two alleles in the same locus towards a phenotype deviates from that expected by a purely additive model.

0

1

2

AA Aa aa AA Aa aa AA Aa aaAA Aa aa

Epistasis defined as the extent to which the joint contribution of two alleles in different loci towards a phenotype deviates from that expected under a purely additive model.

Additive Dominant Recessive

Genoty

pic

mean

Epistasis is very similar... Deviation from additivity between loci.

Within locus:

Between loci:

Locus A

Locus B

Additive

No effect

No effect

Additive

No effect No effect

No effect

No effect

No effect

bb Bb BB

bb Bb BB

BB

Bb

bb

0

1

2

3

4

AA Aa aa AA Aa aa AA Aa aa

Genoty

pic

mean

0

1

2

3

4

Bb BB

bb

Bb BB

bb

Bb BB

bb

AA Aa aa AA Aa aa AA Aa aa

BB

Bb

bb

BB

Bb

bb

BB

Bb

bb

0

1

2

3

4

0

1

2

3

4 bb Bb

BBBB BB

bb Bb

bb Bb

Locus AAdditive Dominant Recessive

Additive

Dominant

Recessive

Locus B

Between loci:Additive (ie. NO epistasis)

Locus AAdditive Dominant Recessive

Additive

Dominant

Recessive

Locus B0 1 2

1 2 3

2 3 4

0 2 2

1 3 3

2 4 4

0 0 2

1 1 3

2 2 4

0 1 2

2 3 4

2 3 4

0 2 2

1 3 3

1 3 3

0 0 2

2 2 4

2 2 4

0 1 2

0 1 2

2 3 4

0 2 2

0 2 2

1 3 3

0 0 2

0 0 2

2 2 4

AA Aa aa AA Aa aa AA Aa aa

BB

Bb

bb

BB

Bb

bb

BB

Bb

bb

1 1

1

1

2 0

1

0

0 2

0

2

Between loci:Additive (ie. NO epistasis)

0 0 0

0 1 1

0 1 1

0 0 0

0 1 2

0 2 3

0 0 0

0 0 1

0 1 2

0 0 0

0 0 0

0 0 4

2 2 4

2 4 2

4 2 2

1 1 1

1 1 1

1 1 8AA Aa aa AA Aa aa AA Aa aa

BB

Bb

bb

BB

Bb

bb

Between loci:

Non-Additive (ie. epistasis)

0 0

1 0

1 0

1

0

1

0

0

0

1 1 3

2 2 4

3 3 5

AA Aa aa

BB

Bb

bb

Epistasis or not ?

Statistical definition of epistasis is scale

dependent

Defined epistasis as a departure from an additive model across loci.

Crucial assumption: genotype effects are measured on the appropriate scale.

AA Aa aa AA Aa aa

+4 +4 +0.7 +0.4

log (x)

No departure from additivity

Significant departure from

additivity

0.00 0.00 1.10

0.69 0.69 1.39

1.10 1.10 1.61

1 1 4

2 2 5

3 3 6

log (x)

AA Aa

BB

Bb

aa

bb

pAABBpAaBB pAaBB

pAABbpAaBb pAaBb

pAAbbpAabb pAabb

Penetrances

AA Aa aa

RRAABBRRAaBB RRAaBB

RRAABbRRAaBb RRAaBb

RRAAbbRRAabb RRAabb

AA Aa aa

ORAABBORAaBB ORAaBB

ORAABbORAaBb ORAaBb

ORAAbbORAabb ORAabb

Relative Risks Odds Ratios

Disease trait

AA Aa

BB

Bb

aa

bb

μAABBμ AaBB μ AaBB

μ AABbμ AaBb μ AaBb

μ AAbbμ Aabb μ Aabb

Genotype Means

Continuous trait

Penetrance scale

Linear scale

RR scale

OR scale

Epistasis defined as departure from:

Additive model

Additive model

Multiplicative model

Multiplicative model

Genotype effects measured on:

Additive:Multiplicative

:

y = LocusA + LocusBy = LocusA × LocusB

3. Designs and methods to detect

epistasis

Study designs

Family-basedCase-ControlCase-only

More robust, fewer assumptions

More efficient, powerful

Methods

1. Regression

2. “Linkage Disequilibrium” or allelic-association

3. Transmission distortion

+ m3. (LocusA × LocusB)

Methods

y = m1.LocusA + m2.LocusB

y = (m1 + m3.LocusB).LocusA + m2.LocusB

Effect of LocusA on y is modified by LocusB

1. Regression

yContinuous trait

Linear regression

Disease trait Logistic regression

+ m3. (LocusA × Env)

Methods

y = m1.LocusA + m2.Env

y = (m1 + m3.Env).LocusA + m2.Env

Effect of LocusA on y is modified by Env

1. Regression

Methods 2. LD-based

Epistasis induces “LD” in cases, even for unlinked loci:

p(a) = 0.2

p(b) = 0.2

1 1 1

1 1 1

1 1 1

.640 .160

.160 .040

A a

B

b

B

b

.640 .160

.160 .040

~ 0

~ 0

“LD”Epistasis model

.41 .21 .02

.21 .10 .01

.03 .01 .00

AA Aa aa

.41 .21 .02

.21 .10 .01

.03 .01 .00

Case

sC

ontr

ols

BB

Bb

bb

BB

Bb

bbBB

Bb

bb

AA Aa aa

Genotype frequenci

es

“Haplotype

frequencies”

Methods 2. LD-based

BB

Bb

bb

p(a) = 0.2

p(b) = 0.2 .41 .21 .02

.21 .10 .01

.03 .01 .001 1 1

1 1 1

1 1 20

AA Aa aa

.40 .20 .03

.20 .10 .01

.03 .01 .02

.640 .160

.160 .040

AA Aa aa

A a

B

b

B

b

.630 .158

.158 .054

~ 0

~ 0.05

Case

sC

ontr

ols

Genotype frequenci

es

“Haplotype

frequencies”

“LD”Epistasis model

BB

Bb

bb

BB

Bb

bb

Epistasis induces “LD” in cases, even for unlinked loci:

Two-locus genotypes

AA (pA2) Aa (2pAqA)

BB (pB2)

Bb (2pBqB)

AABB

aa (qA2)

bb (qB2)

AaBB aaBB

AABb AaBb aaBb

AAbb Aabb aabb

Locus A: aA (pA) (qA)

Locus B: bB (pB) (qB) pB + qB = 1

pA + qA = 1

AAbb = Ab / Ab Ab

Ab

if and only if

AAbb ≠ Ab / Ab A Aif b b

(2-locus genotype) (haplotype)

Methods 2. LD-based

In the presence of Epistasis:

LD cases > 0

LD cases > LD controls

Statistics that measure the strength of association (δ)

between two loci

Case-ControlCase-only

H0: δ = 0 H0: δCases = δControls

LD (D, r2)Correlation

Cases(Scz)

Controls

Genes in 5q GABA cluster

Pamela SklarTracey PetryshenC&M Pato

Pamela SklarTracey PetryshenC&M Pato

Methods 3. Transmission distortion

AA Aa

Aa

BB probands

If the effect of locus A on disease risk is modified by Locus B:

AA Aa

Aa

AA Aa

Aa

50%

Bb probands

52%

bb probands

56%

Same applies for Env instead of Locus B

aa Aa

aa

aa Aa

Aa

AA Aa

Aa

AA Aa

AA

Subset of bb probands Subset of BB probands

→100% →0% → 0% →100%

If variants A and B are in LD (common haplotypes AB / ab)

False positive interactions (due to linkage or population stratification)

TDT requires assumption of independence between loci

Design & Methods

Case-ControlCase-only Family-based

Regression

LD-based

TDT

0

10

20

30

40

50

60

70

80

90

100

100 cases, 100 controls

200 cases, 200 controls

200 cases only

200 controls only

No interaction Interaction

0

10

20

30

40

50

60

70

80

90

100

100 cases, 100 controls

200 cases, 200 controls

200 cases only

200 controls only

0

10

20

30

40

50

60

70

80

90

100

100 cases, 100 controls

200 cases, 200 controls

200 cases only

200 controls only

Pow

er

Case-only designs offer efficient detection of epistasis

Case-only design isn’t always valid

Gene A Gene B

Gene A Gene B

stratification

1. Physical distance

2. Population substructure in case sample

LD Fast, often more powerful

Less useful for continuous traits and/or

family data

Pros Cons

Efficient, powerful Assumptions

Applicable to linked loci

Less efficient

Few methods that efficiently handle relatives

Case-Control

Case-only

Family-based

PLINK

Slow(er)Many extensions possible (GxE, covariates,

etc)

Regression

(unlinked loci, no stratification, etc)

4. Application to genome-wide

datasets

# SNPs # pairs

5 10 50 1,225 500 124,750 250,000 31,249,880,000 500,000 124,999,750,000

An “all pairs of SNPs” approach to epistasis does not scale well…

… but it is feasible! ~1 week, running PLINK using ~200 CPUs.

>3000 individuals

Multiple testing increases false positives

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35 40 45 50

Number of independent tests performed

P(a

tlea

st 1

fals

e po

siti

ve) per test false positive

rate 0.05

per test false positive rate0.001 = 0.05/50

Multiple testing increases false positives

# SNPs # pairs P-value needed

5 10 5e-350 1,225 4e-5500 124,750 4e-7250,000 31,249,880,000 2e-12500,000 124,999,750,000 4e-13

P-value required for experiment-wide significance must be adjusted for the number of tests performed

1

2

3

4

5

6

7

8

9

10

11

1213141516171819202122

Chromosome 13

Ch

rom

osom

es 1

to

22

Genome-wide epistasis screen in Bipolar-disorder

ABCDEFGHIJ

12345678

A 1A 2A 3A 4A 5A 6A 7A 8B 1B 2B 3B 4B 5B 6B 7B 8

…….J 6J 7J 8

A single gene-based test

80 allele-based tests

Gene-environment

Science 2003, 301: 306

Gene-environment

The Journal of Nutrition 2002, 8S: 132

Gene-Gene

Nature 2005, 436: 701

Further reading

• Cordell HJ (2002) Human Molecular Genetics 11: 2463-2468.

– a statistical review of epistasis, methods and definitions

• Clayton D & McKeigue P (2001) The Lancet, 358, 1357-60.

– a critical appraisal of GxE research

• Marchini J, Donnelly P & Cardon LR (2005) Nature Genetics, 37, 413-417

– epistasis in whole-genome association studies