+ All Categories
Home > Documents > Detangling genetic “hairballs”: implementing an abstracted ... ·...

Detangling genetic “hairballs”: implementing an abstracted ... ·...

Date post: 30-Sep-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
1
New sequencing technologies have catapulted biologists into the realm of big data where the need for insight has sparked some interes9ng new biological visualiza9ons. The ability to sequence en9re genomes allows biologists to ask ques9ons on the gene systems level. However, trying to understand gene networks that have tens of thousands of individual genes is challenging. Even when visualized, gene networks are so large and complex that the visualiza9on itself may overwhelm the user and useful knowledge may be concealed in the plethora of visual informa9on. The gene Product Interac9on Conserved Topology visualizer (gPICTviz) is a complicated visualiza9on soGware that depicts individual genes as nodes and rela9onships between genes as edges displayed in a forcedirected layout. This tool can illustrate thousands of nodes and exponen9ally more edges in a single overwhelming visual. The goal of this research was to iden9fy clusters of gene nodes to represent in an abstracted visual that a user can then choose to expand as needed in order to gain more detailed insight. This provides not only a more robust complex visualiza9on, but also saves compute 9me because the graphical component of the visualiza9on is the most computa9onally taxing. To accomplish the visualiza9on, OpenGL was used as well as C programming code for speed and MCLedge soGware for cluster iden9fica9on. The results were a consolidated, more userfriendly gene network visualiza9on that aids biological researchers in finding the answers to ques9ons about or within the genome while promo9ng further integra9on of gene9cs and systems biology. REU Site: Research Experience for Undergraduates in Collabora:ve Data Visualiza:on Applica:ons • June 1 – July 24, 2015 • Clemson University • Clemson, South Carolina REU Funded by NSF ACI Award 1359223 Vetria L. Byrd, PI Sponsored by Clemson University NSF Award #2009134 ACCAC Summer Research Scholar Program Abstract Detangling genetic “hairballs”: implementing an abstracted, gene cluster view for the gPICTviz visualization tool. Kathleen E. Kyle 1 , Karan Sapra 3 , Amari Lewis 2 , Melissa C. Smith 3 , Jill Gemmill 3 , F. Alex Feltus 4 1 Florida State University, Tallahassee, FL, USA; 2 Winston-Salem State University, Winston-Salem, NC, USA; 3 Departments of Electrical and Computer Engineering and 4 Genetics & Biochemistry, Clemson University, SC, USA. Methods & Tools MCLedge • Iden9fy gene clusters Python • Cluster ID genera9on/file reforma^ng C • Store into data structures OpenGL/C • Visualize Figure 4: Original gPICTviz visualiza9on with clusters highlighted. This visualiza9on provides a gene cluster view for the gene networks by depic9ng each gene cluster as a sphere where the sphere size represents the number of genes within that cluster. The edges between two clusters represent all single gene connec9ons between the clusters through varying line width. A thicker edge represents more connec9ons than a thinner edge. The three edge colors represent the different network rela9onships, two within network rela9onships and one between networks rela9onship for a two network graph. Reducing the ligh9ng in the visual can dim the edges and focus on the clusters. The visualiza9on tool also provides zoom and rotate capabili9es, and a search feature to find clusters containing specific gene IDs or gene ontology (GO) terms. Conclusion # of Clusters # of Edges FPS Zma Osa 667 67,039 ~30 Ath Ath 1491 11,611 ~10 Improve ligh9ng and forcedirected layout Add interac9vity such as clicking on a cluster to expand into nodal view Adjust code to u9lize the GPU Integrate with Amari Lewis’ project to include gene ontology informa9on and beier the user interface and overall func9onality Scale to more than two gene networks Performance evalua9on: Future Work Interac:on and Vizualiza:on of Mul:ple layers of High Dimensional biological data. Amari Lewis 1 , Karan Sapra, 2 Kathleen Kyle 3 , Alex Feltus 2 , Melissa Smith 2 ,Jill Gemmill 2 . 1 WinstonSalem State University, 2 Clemson University, 3 Florida State University. G 3 NA: A GPU Op:mized Global Gene Network Alignment Tool. Karan Sapra 1 , Melissa C. Smith 1 , F. Alex Feltus 2 , Asher Sampong 3 , Joshua A. Levine 4 , Anagha Joshi 1 . Departments of 1 Electrical and Computer Engineering and 2 Gene9cs & Biochemistry, Clemson University, SC, 29634, USA; 3 Fort Valley State, Fort Valley, GA 31030, USA; 4 School of Compu9ng, Clemson University, SC 29634, USA. Acknowledgements Advanced Visualiza9on Division Related Work Research Ques:on Visualiza:on Explained Figure 2: Dark side of maizerice gene networks visual Figure 1: Two A. thaliana gene networks Figure 4 Will an abstracted, gene cluster view for gPICTviz facilitate expert inves:ga:on of mul:species gene:c systems? Figure 3: Search feature highlights a cluster that contains “GRMZM2G161306” The cluster view for gPICTviz shows promise as a higherlevel interface for exploring complex gene9c systems. For the purposes of scaling this visualiza9on tool to greater than two networks, it will be necessary to provide abstracted viewing levels, and this cluster view is the first step in that direc9on.
Transcript
Page 1: Detangling genetic “hairballs”: implementing an abstracted ... · New$sequencing$technologies$have$catapulted$biologists$into$the$realm$of$big$data where$the$need$for$insighthas$sparked$some$interes9ng$new$biological$visualizaons.$

New   sequencing   technologies   have   catapulted   biologists   into   the   realm   of   big   data  where  the  need  for  insight  has  sparked  some  interes9ng  new  biological  visualiza9ons.  The  ability  to  sequence  en9re  genomes  allows  biologists  to  ask  ques9ons  on  the  gene  systems   level.   However,   trying   to   understand   gene   networks   that   have   tens   of  thousands  of  individual  genes  is  challenging.  Even  when  visualized,  gene  networks  are  so  large  and  complex  that  the  visualiza9on  itself  may  overwhelm  the  user  and  useful  knowledge  may  be  concealed  in  the  plethora  of  visual  informa9on.  The  gene  Product  Interac9on   Conserved   Topology   visualizer   (gPICTviz)   is   a   complicated   visualiza9on  soGware   that  depicts   individual   genes   as  nodes   and   rela9onships  between  genes   as  edges  displayed  in  a  force-­‐directed  layout.  This  tool  can  illustrate  thousands  of  nodes  and   exponen9ally   more   edges   in   a   single   overwhelming   visual.   The   goal   of   this  research  was   to   iden9fy   clusters   of   gene  nodes   to   represent   in   an   abstracted   visual  that   a   user   can   then   choose   to   expand   as   needed   in   order   to   gain   more   detailed  insight.   This   provides   not   only   a   more   robust   complex   visualiza9on,   but   also   saves  compute   9me   because   the   graphical   component   of   the   visualiza9on   is   the   most  computa9onally  taxing.  To  accomplish  the  visualiza9on,  OpenGL  was  used  as  well  as  C  programming   code   for   speed   and  MCL-­‐edge   soGware   for   cluster   iden9fica9on.   The  results  were   a   consolidated,  more  user-­‐friendly   gene  network   visualiza9on   that   aids  biological  researchers  in  finding  the  answers  to  ques9ons  about  or  within  the  genome  while  promo9ng  further  integra9on  of  gene9cs  and  systems  biology.    

 

REU  Site:  Research  Experience  for  Undergraduates  in  Collabora:ve  Data  Visualiza:on  Applica:ons  •  June  1  –  July  24,  2015  •  Clemson  University  •  Clemson,  South  Carolina  

-­‐  REU  Funded  by  NSF  ACI  Award  1359223  Vetria  L.  Byrd,  PI  

-­‐  Sponsored  by  Clemson  University  NSF  Award  #2009134  

-­‐  ACCAC  Summer  Research  Scholar  Program  

Abstract  

Detangling genetic “hairballs”: implementing an abstracted, gene cluster view for the gPICTviz visualization tool.

Kathleen E. Kyle1, Karan Sapra3, Amari Lewis2, Melissa C. Smith3, Jill Gemmill3, F. Alex Feltus4 1Florida State University, Tallahassee, FL, USA; 2Winston-Salem State University, Winston-Salem, NC, USA;

3Departments of Electrical and Computer Engineering and 4Genetics & Biochemistry, Clemson University, SC, USA. Methods  &  Tools  

MCL-­‐edge  • Iden9fy  gene  clusters  

Python  

• Cluster  ID  genera9on/file  reforma^ng  

C  • Store  into  data  structures  

OpenGL/C  • Visualize  

Figure  4:  Original  gPICTviz  visualiza9on  with  clusters  highlighted.    

This   visualiza9on   provides   a   gene   cluster   view   for   the   gene   networks   by  depic9ng  each  gene  cluster  as  a  sphere  where  the  sphere  size  represents  the  number   of   genes   within   that   cluster.   The   edges   between   two   clusters  represent  all   single   gene   connec9ons  between   the   clusters   through  varying  line  width.  A  thicker  edge  represents  more  connec9ons  than  a  thinner  edge.  The   three   edge   colors   represent   the   different   network   rela9onships,   two  within   network   rela9onships   and   one   between   networks   rela9onship   for   a  two  network  graph.  Reducing  the  ligh9ng  in  the  visual  can  dim  the  edges  and  focus   on   the   clusters.   The   visualiza9on   tool   also   provides   zoom   and   rotate  capabili9es,  and  a  search  feature  to  find  clusters  containing  specific  gene  IDs  or  gene  ontology  (GO)  terms.    

Conclusion  

#  of  Clusters   #  of  Edges   FPS  

Zma  -­‐  Osa   667   67,039   ~30  

Ath  -­‐  Ath   1491   11,611   ~10  

-­‐  Improve  ligh9ng  and  force-­‐directed  layout  -­‐  Add  interac9vity  such  as  clicking  on  a  cluster  to  expand  into  nodal  view  -­‐  Adjust  code  to  u9lize  the  GPU  -­‐  Integrate  with  Amari  Lewis’  project  to  include  gene  ontology  informa9on  

and  beier  the  user  interface  and  overall  func9onality    -­‐  Scale  to  more  than  two  gene  networks  -­‐  Performance  evalua9on:  

Future  Work  

Interac:on  and  Vizualiza:on  of  Mul:ple   layers  of  High  Dimensional  biological  data.  Amari   Lewis1,  Karan  Sapra,2  Kathleen  Kyle3,  Alex  Feltus2,  Melissa  Smith2,Jill  Gemmill2.  1Winston-­‐Salem  State  University,  2Clemson  University,  3Florida  State  University.  G3NA:  A  GPU  Op:mized  Global  Gene  Network  Alignment  Tool.  Karan  Sapra1,  Melissa  C.   Smith1,   F.   Alex   Feltus2,   Asher   Sampong3,   Joshua   A.   Levine4,   Anagha   Joshi1.  Departments   of   1Electrical   and   Computer   Engineering   and   2Gene9cs  &   Biochemistry,  Clemson   University,   SC,   29634,   USA;   3Fort   Valley   State,   Fort   Valley,   GA   31030,   USA;  4School  of  Compu9ng,  Clemson  University,  SC  29634,  USA.  

Acknowledgements  

Advanced  Visualiza9on  

Division  

Related  Work  

Research  Ques:on  

Visualiza:on  Explained  

Figure  2:  Dark  side  of  maize-­‐rice  gene  networks  visual  

Figure  1:  Two  A.  thaliana  gene  networks    Figure  4  

Will  an  abstracted,  gene  cluster  view  for  gPICTviz  facilitate  expert  inves:ga:on  of  mul:species  gene:c  systems?

 

Figure  3:  Search  feature  highlights  a  cluster  that  contains  “GRMZM2G161306”          

The   cluster   view   for   gPICTviz   shows   promise   as   a   higher-­‐level   interface   for  exploring   complex   gene9c   systems.   For   the   purposes   of   scaling   this  visualiza9on   tool   to   greater   than   two   networks,   it   will   be   necessary   to  provide  abstracted  viewing  levels,  and  this  cluster  view  is  the  first  step  in  that  direc9on.  

Recommended