+ All Categories
Home > Education > Pathway analysis 2012

Pathway analysis 2012

Date post: 13-Jul-2015
Category:
Upload: stephen-turner
View: 356 times
Download: 0 times
Share this document with a friend
Popular Tags:
28
Pathway Analysis Adding Func2onal Context to HighThroughput Results Stephen D. Turner, Ph.D. Bioinforma2cs Core Director [email protected] bioinforma2cs.virginia.edu
Transcript
Page 1: Pathway analysis 2012

Pathway  Analysis  Adding  Func2onal  Context  to  High-­‐Throughput  Results  

Stephen  D.  Turner,  Ph.D.  Bioinforma2cs  Core  Director  [email protected]  

bioinforma2cs.virginia.edu  

Page 2: Pathway analysis 2012

Outline  •  Bioinforma2cs  &  the  Bioinforma2cs  Core  •  Service  Highlight:  Pathway  Analysis  •  IPA  demo  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 3: Pathway analysis 2012

Bioinforma2cs  Origins  •  Rooted  in  sequence  analysis  •  Driven  by  need  to:  -  Collect  -  Annotate  -  Analyze  

Page 4: Pathway analysis 2012

What  is  bioinforma2cs?  

(Diagram  modified  from  @drewconway)  

Page 5: Pathway analysis 2012

What  is  bioinforma2cs?  

“There  is  a  tremendous  amount  of  informa4on  regarding  evolu&onary  history  and  biochemical  func&on  implicit  in  each   sequence   and   the   number   of   known   sequences   is  growing  explosively.  We  feel  it  is  important  to  collect  this  significant   informa4on,  correlate   it   into  a  unified  whole  and  interpret  it.”  

M.  Dayhoff,  February  27,  1967  

Page 6: Pathway analysis 2012

UVA  Bioinforma2cs  Core  

•  A  centralized  resource  for  providing  expert  and  2mely  bioinforma2cs  consul2ng  and  data  analysis.  

•  Main  goals:  help  you  publish  and  get  funding.  –  1.  Service  –  2.  Training  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 7: Pathway analysis 2012

Sample prep

Sequencing

Raw data Differential expression Gene identification Novel Genes Discoveries …etc.

This  is  the  “stuff”  we  do  in  the  bioinforma2cs  core!  

Find  out  what  this  “stuff”  is  at  bioinforma2cs.virginia.edu  

Page 8: Pathway analysis 2012

Services  •  Gene  expression:  Microarray  Analysis  •  Gene  expression:  RNA-­‐seq  Analysis  •  Pathway  analysis  •  DNA  Varia2on  (GWAS,  NGS)  •  DNA  Binding  /  ChIP-­‐Seq  •  DNA  Methyla2on  •  Grant  /  Manuscript  support  •  Custom  development  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 9: Pathway analysis 2012

Services  Gene  expression:  Microarray  Analysis    •  Accession  and  analysis  of  publicly  available  data  (e.g.  GEO,  ArrayExpress).  •  Preprocessing:  background  subtrac2on,  summariza2on,  and  quan2le  normaliza2on  using  

RMA  (Robust  Mul2chip  Average)  expression  measure  described  in  Irizarry  et  al.  Biosta2s2cs  4:249-­‐264.  

•  Quality  assessment:  –  Visualiza2on  of  signal  intensity  distribu2ons  of  each  array  using  boxplots  and  density  plots.  –  MA  plots  to  visualize  signal  intensity  over  average  intensity.  –  Principal  components  analysis  to  visualize  the  overall  data  (dis)similarity  between  arrays.  

•  Analysis:  –  Es2ma2on  of  fold  changes  and  standard  errors  using  a  linear  model.  –  Empirical  Bayes  smoothing  to  standard  errors.  –  Lists  of  top  differen2ally  expressed  genes,  fold  changes,  sta2s2cal  significance,  mul2ple  tes2ng  correc2on.  

•  Visualiza2on:  –  Heatmaps  and  dendrograms.  –  Volcano  plots  to  visualize  sta2s2cal  significance  by  fold  change.  

•  Biological  context  –  Pathway/Func2onal  Analysis.  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 10: Pathway analysis 2012

Services  Gene  expression:  RNA-­‐seq    •  Pre-­‐alignment  quality  assessment:  

–  Per-­‐base  sequence  quality  –  Per-­‐base  sequence  content  –  Per-­‐base  GC  content  –  Search  for  overrepresented  sequences  (adapters,  primers,  etc)  

•  Alignment  to  a  reference  genome:  –  Homo  sapiens  –  Mus  musculus  –  Rahus  norvegicus  –  Bos  taurus  –  Canis  familiaris  –  Gallus  gallus  –  Drosophila  melanogaster  –  Arabidopsis  thaliana  –  Caenorhabdi2s  elegans  –  Saccharomyces  cerevisiae  

•  Post-­‐alignment  quality  assessment:  –  Flagging  duplicate  reads  –  Es2ma2on  of  library  complexity  –  Insert  size  distribu2on  (for  paired-­‐end  sequencing)  –  Analysis  of  coverage  over  transcript  posi2on  

•  Transcript  assembly  •  Differen2al  expression  tes2ng  

–  Isoforms  –  Genes  –  Primary  transcripts  –  Coding  sequence  

•  Differen2al  splicing  analysis    •  Differen2al  coding  output    •  Differen2al  promoter  use  •  Visualiza2on:  assistance  with  visualiza2on  using  IGV.  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 11: Pathway analysis 2012

Services  DNA  Varia2on:  Genotyping    

•  Study  design  &  power  calcula2ons  for  SNP  genotype-­‐phenotype  associa2on  studies  •  Data  management  and  quality  control  •  PCA  for  popula2on  stra2fica2on  control  •  Imputa2on  to  a  reference  popula2on  (e.g.  HapMap,  1000  Genomes)  •  Analysis,  interpreta2on,  visualiza2on  •  Manuscript  prepara2on  •  Grant  support  (compliance  with  NIH  data  sharing  policies,  methodology  for  data  management,  

design,  analysis,  and  interpreta2on)  •  Acquisi2on  of  publicly  available  data  (dbGaP)  

DNA  Varia2on:  Next-­‐Gen  Sequencing    

•  Alignment  to  a  reference  genome  •  Calibra2on  of  quality  scores  and  duplicate  read  removal  •  Variant  calling  •  Variant  annota2on  •  SNP  effect  predic2on  •  De  novo  assembly  •  Any  of  the  applicable  analysis,  interpreta2on,  and  visualiza2on  services  described  above  for  

genotyping  data.  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 12: Pathway analysis 2012

Service  Highlight:  “Pathway  Analysis”  •  You’ve  done  your  microarray/RNA-­‐Seq  experiment  

–  You  have  a  list  of  genes  –  Want  to  put  these  into  func2onal  context  –  What  biological  processes  are  perturbed?  –  What  pathways  are  being  dysregulated?  –  Data  reduc2on:  hundreds  or  thousands  of  genes  can  be  reduced  to  10s  of  pathways  

–  Iden2fying  ac2ve  pathways  =  more  explanatory  power  •  “Pathway  analysis”  encompasses  many,  many  techniques.  

1.  1st  Genera2on:  Overrepresenta2on  Analysis  (E.g.  GO  ORA)  2.  2nd  Genera2on:  Func2onal  Class  Scoring  (e.g.  GSEA)  3.  3rd  Genera2on  (in  development):  Pathway  Topology  (E.g.  SPIA)  

•  bit.ly/pathway-­‐analysis  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 13: Pathway analysis 2012

Over-­‐representa2on  analysis  (ORA)  •  Many  varia2ons  on  the  same  theme:  sta2s2cally  evaluates  the  frac2on  of  genes  in  par2cular  pathway  that  show  changes  in  expression.  

•  Algorithm:  1.  Create  input  list  (e.g.  “significant  at  p<0.05”)  2.  For  each  gene  set:  

a.  Count  number  of  input  genes  b.  Count  number  of  “background”  genes  (e.g.  all  genes  on  plaoorm).  

3.  Test  each  pathway  for  over-­‐representa2on  of  input  genes  •  Gene  Set:  typically  gene  ontology  (GO)  term.  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 14: Pathway analysis 2012

Gene  Ontology  •  Ontology  =  formal  representa2on  of  a  knowledge  domain.  •  Gene  ontology  =  cell  biology.  •  GO  represented  by  directed  acyclic  graph  (DAG).  

–  Terms  are  nodes,  rela2onships  are  edges.  –  Parent  terms  are  more  general  than  their  child  terms.  –  Unlike  a  simple  tree,  terms  can  have  mul2ple  parents.  

December  20,  2012   bioinforma2cs.virginia.edu  

Rhee,  S.  Y.,  Wood,  V.,  Dolinski,  K.,  &  Draghici,  S.  (2008).  Use  and  misuse  of  the  gene  ontology  annota2ons.  Nature  reviews.  Gene2cs,  9(7),  509-­‐15.  doi:10.1038/nrg2363  

Page 15: Pathway analysis 2012

GO  ORA:  Example  •  Algorithm:  

1.  Create  input  list  (e.g.  “significant  at  p<0.05”)  2.  For  each  gene  set:  

a.  Count  number  of  input  genes  b.  Count  number  of  “background”  genes  (e.g.  all  genes  on  plaoorm).  

3.  Test  each  pathway  for  over-­‐representa2on  of  input  genes  •  Ex:  GO  “Purine  Ribonucleo2de  Biosynthe2c  Process”  

–  1%  of  input  (significant)  genes  are  annotated  with  this  term.  –  1%  of  genes  on  the  chip  are  annotated  with  this  term.  –  Not  significantly  overrepresented.  

•  Ex:  GO  “V(D)J  Recombina2on”  –  20%  of  input  (significant)  genes  are  annotated  with  this  term.  –  1%  of  genes  on  the  chip  are  annotated  with  this  term.  –  Highly  significantly  over-­‐represented!.  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 16: Pathway analysis 2012

GO  ORA:  Example  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 17: Pathway analysis 2012

GO  ORA:  Limita2ons  •  Some  categories  are  so  general  they’re  meaningless  (e.g.  “cellular  process”).  

•  ORA  uses  genes  above  a  cutoff  and  discards  everything  else.  

•  ORA  only  uses  the  number  genes,  and  ignores  their  measured  changes.  

•  Two  assump2ons  violated  –  Genes  are  independent  (NOT!  Coexpression,  interac2on,  etc).  –  Pathways  are  independent  (by  defini2on  violated  by  DAG).  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 18: Pathway analysis 2012

Func2onal  Class  Scoring  •  Theory:  while  large  changes  in  individual  genes  can  have  significant  effects  on  pathways,  weaker  but  coordinated  changes  in  sets  of  func2onally  related  genes  can  also  have  significant  effects.  

•  General  Algorithm:  1.  Compute  gene-­‐level  sta2s2c  (e.g.  Fold  Change,  student’s  t).  2.  Aggregate  gene  level  sta2s2cs  for  all  genes  in  pathway  into  

single  pathway-­‐level  sta2s2c.  3.  Assess  significance  with  permuta2on.  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 19: Pathway analysis 2012

Gene  Set  Enrichment  Analysis  1.  Calculate  an  Enrichment  Score  

a)  Rank  genes  by  their  expression  difference  b)  For  each  Gene  Set*:    

i.  Compute  cumula2ve  sum  over  ranked  genes  1.  Increase  sum  when  gene  is  in  set,  decrease  otherwise  2.  Magnitude  of  increment  depends  on  gene-­‐phenotype  correla2on  

ii.  Record  the  maximum  devia2on  from  zero  as  Enrichment  Score  (ES)  2.  Assess  significance  

a)  Permute  phenotype  (or  gene  labels)  1000  2mes  b)  Compute  ES  score  for  each  permuta2on  (empiric  null).  c)  Compare  ES  score  for  actual  data  to  distribu2on  of  ES  scores  from  permuted  

data.  d)  Normalize  ES  by  accoun2ng  for  gene  set  size  e)  Control  mul2ple  tes2ng  by  calcula2ng  FDR  for  each  NES  

•  *  Gene  sets:  Come  from  MSigDB  –  hhp://www.broadins2tute.org/gsea/msigdb/index.jsp  –  MSigDB  is  collec2on  of  annotated  gene  sets  for  use  with  GSEA  sovware.    –  Posi2onal,  curated,  computa2onally  predicted,  GO.  –  Curated:  KEGG,  Reactome,  STKE,  etc.  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 20: Pathway analysis 2012

GSEA:  Example  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 21: Pathway analysis 2012

FCS/GSEA:  Limita2ons  •  Violate  same  assump2ons  as  GO-­‐ORA:  –  Genes  are  independent  –  Pathways  are  independent  

•  Only  consider  number/magnitude  of  genes,  and  ignore  other  informa2on  in  databases:  –  Direc4onality  of  the  interac2on  –  Nature  of  the  interac2on  (ac2va2ng,  inhibi2on,  etc).  – Where  the  interac2on  occurs  (nucleus,  cytoplasm,  etc).  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 22: Pathway analysis 2012

Pathway  Topology:  SPIA  •  U2lizes  direc2onality,  

func2on,  and  topology.  •  Computes  two  orthogonal  

p-­‐values:  –  pNDE:  Number  of  Differen2ally  Expressed  genes  (E.g.  like  ORA).  

–  pPERT:  degree  of  perturba2on  •  pG  is  overall  p-­‐value  (pNDE  

and  pPERT  combined)  •  pGFDR  is  overall  FDR-­‐

corrected  p-­‐value  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 23: Pathway analysis 2012

Pathway  Topology:  SPIA  •  TCR  Signaling  

Pathway  Results  –  pNDE:  6.5e-­‐9  –  pPERT:  .29  –  pGFDR:  1.2e-­‐6  –  Conclusion:  many  

differen2ally  expressed  genes,  but  pathway  may  not  be  badly  perturbed.  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 24: Pathway analysis 2012

Pathway  Topology  /  SPIA:  Limita2ons  •  With  SPIA,  s2ll  need  arbitrary  “cutoff”  e.g.  top  500,  or  p<0.05,  etc.  

•  True  topology  is  dependent  on  type  of  cell  due  to  cell-­‐specific  gene  expression  profiles.  

•  Tissue-­‐specific  topology  is  rarely  available  and  fragmented  in  databases,  even  if  it’s  fully  understood.  

•  Other  general  limita2ons  of  pathway  analysis  -­‐-­‐-­‐  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 25: Pathway analysis 2012

Pathway  Analysis:  General  Limita2ons  •  Low  resolu2on  knowledge  bases  –  E.g.  RNA-­‐seq  studies  have  found  >90%  of  transcriptome  is  alterna2vely  spliced.  

–  Different  transcripts  can  have  different  or  opposing  func2ons.  •  Incomplete/inaccurate  annota2ons.  •  Oct  2007:  95%  GO  annota2ons  inferred  electronically  (i.e.  not  manually  curated).  

•  Missing  condi2on-­‐  and  cell-­‐specific  informa2on.  •  Methodological  challenge:  lack  of  benchmarks.  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 26: Pathway analysis 2012

Pathway  Analysis:  Conclusions  

December  20,  2012   bioinforma2cs.virginia.edu  

Pathway  analysis  gives  you  more  biological    insight  than  staring  at  lists  of  genes.  

   

Pathway  analysis  is  complex,  and  has  many  limita2ons.      

Pathway  analysis  is  s2ll  more  of  an  exploratory    procedure  rather  than  a  pure  sta2s2cal  endpoint.  

   

The  best  conclusions  are  made  by  viewing  enrichment  analysis    results  through  the  lens  of  the  inves4gator’s  expert  biological  knowledge.  

Page 27: Pathway analysis 2012

IPA  Demo  •  Background:  Microarray  data  from  Childhood  Exacerbated  

Asthma  compared  to  normal  state.    •  Ques2ons:  Do  data  supported  involvement  of  immune/

inflammatory  responses  and  viral  infec2on  in  the  acute  asthma  ahack?  

•  Tasks:    –  View  Canonical  pathways  that  contain  significant  numbers  of  genes  from  

this  dataset.  –  Overlay  a  Func2on/Disease  state  that  shows  how  key  signaling  pathways  

for  figh2ng  off  respiratory  infec2ons  overlapped  with  asthma2c  inflamma2on.  

–  Overlay  Biomarkers  that  iden2fy  genes  in  the  infec2on  signaling  pathway  that  are  also  used  for  diagnosis  and  efficacy  indicators  for  asthma  treatments.  

–  Search  the  Ingenuity  Knowledge  Base  for  literature  references  that  support  your  findings.  

–  Inves2gate  a  “weird”  finding…  

December  20,  2012   bioinforma2cs.virginia.edu  

Page 28: Pathway analysis 2012

Thank  you  

Web:    bioinforma2cs.virginia.edu  

E-­‐mail:  [email protected]  

Blog:    www.Ge{ngGene2csDone.com  

Twiher:  twiher.com/gene2cs_blog  

December  20,  2012   bioinforma2cs.virginia.edu  


Recommended