+ All Categories
Home > Documents > Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene)...

Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene)...

Date post: 29-Mar-2018
Category:
Upload: ngotruc
View: 218 times
Download: 2 times
Share this document with a friend
29
Significance of genegene interac/ons (epistasis) PSB 2015 Tutorial Marylyn D Ritchie, PhD Director, Biomedical and Transla7onal Informa7cs, Geisinger Clinic Professor, Biochemistry and Molecular Biology, The Pennsylvania State University
Transcript
Page 1: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

Significance  of  gene-­‐gene  interac/ons  (epistasis)  

PSB  2015  Tutorial

Marylyn  D  Ritchie,  PhD  Director,  Biomedical  and  Transla7onal  Informa7cs,  Geisinger  Clinic  

Professor,  Biochemistry  and  Molecular  Biology,  The  Pennsylvania  State  University  

 

Page 2: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

NHGRI  GWA  Catalog  www.genome.gov/GWAStudies  www.ebi.ac.uk/fgpt/gwas/    

Published  Genome-­‐Wide  Associa/ons  through  12/2013  Published  GWA  at  p≤5X10-­‐8  for  17  trait  categories  

As of 12/18/14, the catalog includes 2,087 publications and 15,176 SNPs.

Page 3: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

Num

ber  o

f  Associa7o

ns  

1000  

900  

800  

700  

600  

500  

400  

300  

200  

100  

1.2      1.4      1.6      1.8      2        2.2      2.4      2.6      2.8    3          4            5            6          9          10        12      14      16      18        20      30        40  

Odds  Ra7o  (upper  inclusive  bound)  Marylyn  Ritchie,  Jan  2014  

Distribu/on  of  Effects  

Page 4: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

Biology  is  complex  

Page 5: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

n  Epistasis – two or more genes interacting in a non-additive manner to confer disease risk; gene-gene interactions

AA AA Aa aa

Dis

ease

risk

1.0

.50

0.0

BB

Bb bb

Genotype p(D)

AABB 0.0

AABb 0.0

AAbb 1.0

AaBB 0.0

AaBb .50

Aabb 0.0

aaBB 1.0

aaBb 0.0

aabb 0.0

Epistasis  

Page 6: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

Moore and Williams, BioEssays 27:637–646, 2005

Sta/s/cal  Epistasis  vs.  Biological  Epistasis  

Page 7: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

Tradi/onal  Approach  

Page 8: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

n Typically one marker or SNP at a time to detect loci exhibiting main effects

n Follow-up with an analysis to detect interactions between the main effect loci

n Some studies attempt to detect pair-wise interactions even without main effects

n Higher dimensions are usually not possible with traditional methods

Tradi/onal  Sta/s/cal  Approaches  Gene/c  Epidemiology  -­‐  Associa/on  Analysis  

Page 9: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

n Logistic Regression u Small sample size can result in biased estimates of

regression coefficients and can result in spurious associations (Concato et al. 1993)

u Need at least 10 cases or controls per independent variable to have enough statistical power (Peduzzi et al. 1996)

u Curse of dimensionality is the problem (Bellman 1961)

Tradi/onal  Sta/s/cal  Approaches  Gene/c  Epidemiology  -­‐  Associa/on  Analysis  

Page 10: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

AA Aa aa

SNP 1

N = 100 50 Cases, 50 Controls

Curse  of  Dimensionality  

Page 11: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

SNP 2

AA Aa aa

BB

Bb

bb

N = 100 50 Cases, 50 Controls

SNP 1

Curse  of  Dimensionality  

Page 12: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

AA Aa aa BB Bb bb

CC Cc cc

DD

Dd

dd

AA Aa aa AA Aa aa

BB Bb bb

BB Bb bb

SNP 1 SNP 1 SNP 1

SNP

2 SN

P 2

SNP

2

SNP

4

SNP 3

N = 100 50 Cases, 50 Controls

Curse  of  Dimensionality  

Page 13: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

If interactions with minimal main effects are the norm rather than the exception, can we analyze all possible combinations of loci with traditional approaches to detect purely interaction effects ?

NO

Page 14: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

n  ~500,000 SNPs to span the genome (HapMap)

SNP’s in each subset

1 2 3 4 5

5 x 105

2 x 1016

1 x 1011

3 x 1021

2 x 1026

Num

ber o

f Pos

sibl

e C

ombi

natio

ns

How  many  combina/ons  are  there?    

Page 15: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

SNP’s in each subset

Num

ber o

f Pos

sibl

e C

ombi

natio

ns

n  ~500,000 SNPs to span the genome (HapMap)

1 2 3 4 5

5 x 105

2 x 1016

1 x 1011

3 x 1021

2 x 1026

2 x 1026 combinations

* 1 combination per second

* 86400 seconds per day

---------

2.979536 x 1021 days to complete

(8.163113 x 1018 years)

How  many  combina/ons  are  there?    

Page 16: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

SNP’s in each subset

Num

ber o

f Pos

sibl

e C

ombi

natio

ns

n  ~500,000 SNPs to span the genome (HapMap)

1 2 3 4 5

5 x 105

2 x 1016

1 x 1011

3 x 1021

2 x 1026

2 x 1026 combinations

* 1 combination per second

* 86400 seconds per day

---------

2.979536 x 1021 days to complete

(8.163113 x 1018 years)

5 Million SNPs in current technology

# SNPs # models time** 1 SNP 5.00x106 5 sec 2 SNPs 1.25x1013 144 days 3 SNPs 2.08x1019 2.4x108 days

4 SNPs 2.60x1025 3.01x1014 days

5 SNPs 2.60x1031 3.01x1020 days

**assuming 1 CPU that performs 1 million tests per second

How  many  combina/ons  are  there?    

Page 17: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

5.47x1012 days

Page 18: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

n Advantages u Computationally feasible u Easy to interpret

n Disadvantages u Genes must have large main effects u Difficult to detect genes if interactions with other

genetic and environmental factors are important u CANNOT do an exhaustive search

Tradi/onal  Approach  

Page 19: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

New  Sta/s/cal  Approaches  •  Review  paper  

•  Pharmacogenomics.  2007  8(9)  :1229-­‐41.  •  Reviews  approximately  40  methods  developed  to  detect  gene-­‐gene  and  gene-­‐environment  interac7ons  

Page 20: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

New  Sta/s/cal  Approaches  

Page 21: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

New  Sta/s/cal  Approaches  

Page 22: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

Model

Fitn

ess

Mt. Fuji

Simple  Fitness  Landscape  

Page 23: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

Fitn

ess

Model

Waimea Canyon

Complex  Fitness  Landscape  

Page 24: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

 Epistasis  in  GWAS  Data    § Exhaus7ve  evalua7on  

§ Evaluate  interac7ons  in  top  hits  from  single-­‐SNP  analysis  

§ Use  prior  biological  knowledge  to  evaluate  specific  combina7ons  –  “Candidate  Epistasis”  

Carlson CS, Eberle MA, Kruglyak L, Nickerson DA. Mapping complex disease loci in whole-genome association studies. Nature 2004 May 27;429(6990):446-52.

Page 25: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

Goal:  to  build  biologically  plausible  models  of  gene-­‐gene  interac7ons  to  test  for  associa7on  using  an  automated  bioinforma7cs  tool  based  on  biological  features  

Page 26: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

•  Use publicly available databases to establish relationships between gene-products

•  Suggestions of biological epistasis between genes

•  Integrating information from the genome, transcriptome, and proteome into analysis

Bush WS, Dudek SM, Ritchie MD. Biofilter: a knowledge-integration system for the multi-locus analysis of genome-wide association studies. Pacific Symposium on Biocomputing, 368-79 (2009).

The  Biofilter  

Page 27: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

Bush WS, Dudek SM, Ritchie MD. Biofilter: a knowledge-integration system for the multi-locus analysis of genome-wide association studies. Pacific Symposium on Biocomputing, 368-79 (2009).

LOKI:  Library  of  Knowledge  Integra/on  

Page 28: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

Secondary  List  (Biofilter  Source  or  User  Provided)  

Biofilter  Source(s)  to  Annotate  List  

Generate  Pairwise  

Interac7on  Models  and  Implica7on  Indices  

Intersec7on  of  the  Two  Lists  

List  of  Loci    or  Regions  

Link  Loci  or  Regions  to  

Genes  in  LOKI  

Filtering  

Annota7on  

Modeling  

1  

2  

3  

Gene  1  

Gene  2  

Annotated  List  of  Loci  

Link  LOKI  Genes  to  Sources/Groups  

Gene  1  

Gene  2  

Source    1  

Source    2  

Group  1  

Group  2  

Group  1  

Loci  6,  CHR,  BP,  RSID,  Gene  

Loci  1,  CHR,  BP,  RSID,  Gene  Loci  2,  CHR,  BP,  RSID,  Gene  Loci  3,  CHR,  BP,  RSID,  Gene  Loci  4,  CHR,  BP,  RSID,  Gene  Loci  5,  CHR,  BP,  RSID,  Gene  

Loci  7,  CHR,  BP,  RSID,  Gene  

Page 29: Significanceofgene,gene) interac/ons(epistasis) · PDF fileSignificanceofgene,gene) interac/ons(epistasis)! ... Higher dimensions are usually not possible with ... 2 Group’ 1 Loci’6,’CHR,’BP,’RSID,’Gene

Summary  

• Gene-­‐gene  interac7ons  are  important  components  of  complex  trait  gene7c  architecture  

• Gene-­‐gene  interac7ons  are  challenging  to  detect:  •  Due  to  data  sparseness  in  high  dimensions  •  Due  to  the  combinatorics  of  the  search  •  Due  to  complexity  

• Much  research  is  ongoing  to  develop  novel  methods  and  strategies  to  address  these  issues  


Recommended