Research resources: curating the new eagle-i discovery system

Post on 14-Jul-2015

31 views 1 download

Tags:

transcript

Research  resources:  cura,ng  the  new  eagle-­‐i  discovery  system  Nicole  Vasilevsky1,  Tenille  Johnson2,  Karen  Corday2,  Carlo  Torniai1,  Ma:hew  Brush1,  Sco:  Hoffmann1,  Erik  Segerdell1,    Melanie  L.  Wilson1,  Christopher  J.  Shaffer1,  David  Robinson1,  and  Melissa  A.  Haendel1**  1  Oregon  Health  &  Science  University,  Library,  Portland,  Oregon  2  Harvard  Medical  School,  Center  for  Biomedical  InformaTcs,  Cambridge,  Massachuse:s  

www.eagle-­‐i.net  Open  source  so;ware  available  at:    h=ps://open.med.harvard.edu/display/eaglei/So;ware  eagle-­‐i  Ontology  GoogleCode:    h=p://code.google.com/p/eagle-­‐i/    

Acknowledgements  **We,  the  authors,  represent  the  members  and  leaders  of  the  eagle-­‐i  CuraTon  team,  and  describe  some  of  the  efforts  and  products  of  all  teams  involved  in  the  development  of  the  eagle-­‐i  discovery  system.  We  would  like  to  thank  the  Resource  NavigaTon  team,  led  by  Richard  Pearse;  SoWware  Build  team,  led  by  Daniela  Bourges;  and  Project  Management  team,  led  by  Julie  McMurry.  We  would  also  like  to  thank  Jackie  Wirz.  We  gratefully  acknowlege  NIH  award  #U24RR029825.  

Seman,c  Web  Entry  and  Edi,ng  Tool  Components  of  the  eagle-­‐i  annotaTon  tool,  known  by  the  acronym  SWEET,  are  generated  directly  from  the  eagle-­‐i  ontology.  The  SWEET  contains  both  annotaTon  fields  that  are  auto-­‐populated  using  the  ontology  (purple  box)  and  free  text  (orange  box).  Entrez  Gene  ID  links  out  to  the  NCBI  database  (red  box).  Fields  in  the  SWEET  can  also  link  records  to  other  records  in  the  repository,  such  as  related  publicaTons  or  documentaTon  (blue  box).  Users  can  request  new  terms  be  added  to  the  ontology  using  the  Term  Request  field.      

Ontological  modeling  of  research  resources  

Data  Cura,on  at  eagle-­‐i  

Development  of  data  curaTon  pracTces  at  eagle-­‐i  depended  on  the  Resource  NavigaTon  team  for  data  collecTon,  the  CuraTon  team  for  ontology  development  and  data  QA,  and  the  SoWware  team  for  user  interface  design  in  an  iteraTve  process.  Tools  and  documentaTon  were  developed  to  assist  users  and  team  members  with  each  of  these  processes.  

Lessons  Learned  • Balance  the  data  you  need  with  the  data  you  can  get  • Documenta,on  and  quality  assurance  are  itera,ve  • Tools  and  technology  choices  depend  on  the  above  

Denotes  required  annotaTons.      

Denotes  quesTons  eliciTng  informaTon  for  annotaTon.  

Denotes  redirecTon  to    a  different  decision  tree.      

Denotes  higher  value/priority  annotaTons.      Denotes  medium  value/priority  annotaTons.      Denotes  lower  value/priority  annotaTons.      

Denotes  drop  down  or  annotaTon  field  examples.  

Decision  trees  assist  with  data  entry    and  annota,on  of  resources  

The  Ideal  Scholarly  Research  Cycle    

During  the  course  of  collecTng  informaTon  about  research  resources,  which  many  laboratories  were  willing  to  share,  we  discovered  that  while  larger  core  faciliTes  rouTnely  have  resource  and  workflow  organizaTon  strategies,  primary  research  labs  very  rarely  do.  This  creates  barriers  to  reproducing  experiments  as  well  as  to  publishing  and  sharing  resources.  Giving  labs  organizaTonal  tools  can  help  address  these  issues.  

Provide  scien,sts  with  the  tools  they  need    to  record  their  resources  during  the  course  of  research  

 

How  can  we  make  this  cycle  more  efficient?    

o  Researchers  produce  data  and  resources  that  lead  to  publicaTons.    

o  Published  data  informs  researchers  of  new  experimental  designs.    

o  InformaTon  about  researchers,  resources,    data,  and  published  papers  is  stored  in  various  public  repositories.  

The  goal  of  eagle-­‐i  is  to  make  scienTfic  research  resources  more  visible  via  a  federated  network  of  insTtuTonal  repositories.  Using  an  ontology-­‐driven  approach  for  biomedical  resource  annotaTon  and  discovery,  the  Network  currently  includes  resources  from  23  insTtuTons.  

New  ini,a,ves  with  eagle-­‐i  NCATS  has  funded  two  new  projects  that  leverage  eagle-­‐i  to  further  translaTonal  science.  The  first  project  aims  to  expand  the  breadth,  quality,  and  discoverability  of  data  about  people  and  resources  by  harmonizing  the  ontologies  of  VIVO,  eagle-­‐i,  and  ShareCenter  (www.ctsaconnect.org).  The  second  project  aims  to  expand  the  eagle-­‐i  plakorm  to  new  CTSA  insTtuTons,  and  to  publish  resources  as  Linked  Open  Data.  

BiocuraTon  

Data  collecTon  

User  interface  design  

Ontology  development  

CuraTon  guidelines  

SPARQL  query  tool  for  QA  

Ontology  Browser  

SWEET   Search  applicaTon  

Decision  trees  

Google  code  

The  eagle-­‐i  workflow  

Search  applicaTon  

AnnotaTon  tool  

InsTtuTonal  repositories  

Biocurator   Ontology  Reques

t  new  terms  

Request  resources  

eagle-­‐i  parTcipaTng  lab  

Researcher  

Resources    and  data  

Researcher Publica,ons  

Public  repositories  •  eagle-­‐i  •  MODs  •  NIF  •  Entrez  Gene...  

Public  repositories  •  PubMed  •  Google  Scholar  •  Mendeley…  

Professional    networking:  •  VIVO  •  Harvard  Profiles  •  LinkedIn…  

1  

3  

2  

Major  eagle-­‐i  resource  types  are  shown  as  dark  boxes.  Persons  and  laboratories  play  a  central  role  in  eagle-­‐i.    Classes  and  properTes  are  reused  from  pre-­‐exisTng  ontologies  or  created  de  novo.  Examples  of  some  of  the  relaTons  between  the  classes  are  indicated.