Research resources: curating the new eagle-i discovery system

transcript

Research resources: cura,ng the new eagle-‐i discovery system Nicole Vasilevsky1, Tenille Johnson2, Karen Corday2, Carlo Torniai1, Ma:hew Brush1, Sco: Hoffmann1, Erik Segerdell1, Melanie L. Wilson1, Christopher J. Shaffer1, David Robinson1, and Melissa A. Haendel1** 1 Oregon Health & Science University, Library, Portland, Oregon 2 Harvard Medical School, Center for Biomedical InformaTcs, Cambridge, Massachuse:s

www.eagle-‐i.net Open source so;ware available at: h=ps://open.med.harvard.edu/display/eaglei/So;ware eagle-‐i Ontology GoogleCode: h=p://code.google.com/p/eagle-‐i/

Acknowledgements **We, the authors, represent the members and leaders of the eagle-‐i CuraTon team, and describe some of the efforts and products of all teams involved in the development of the eagle-‐i discovery system. We would like to thank the Resource NavigaTon team, led by Richard Pearse; SoWware Build team, led by Daniela Bourges; and Project Management team, led by Julie McMurry. We would also like to thank Jackie Wirz. We gratefully acknowlege NIH award #U24RR029825.

Seman,c Web Entry and Edi,ng Tool Components of the eagle-‐i annotaTon tool, known by the acronym SWEET, are generated directly from the eagle-‐i ontology. The SWEET contains both annotaTon fields that are auto-‐populated using the ontology (purple box) and free text (orange box). Entrez Gene ID links out to the NCBI database (red box). Fields in the SWEET can also link records to other records in the repository, such as related publicaTons or documentaTon (blue box). Users can request new terms be added to the ontology using the Term Request field.

Ontological modeling of research resources

Data Cura,on at eagle-‐i

Development of data curaTon pracTces at eagle-‐i depended on the Resource NavigaTon team for data collecTon, the CuraTon team for ontology development and data QA, and the SoWware team for user interface design in an iteraTve process. Tools and documentaTon were developed to assist users and team members with each of these processes.

Lessons Learned • Balance the data you need with the data you can get • Documenta,on and quality assurance are itera,ve • Tools and technology choices depend on the above

Denotes required annotaTons.

Denotes quesTons eliciTng informaTon for annotaTon.

Denotes redirecTon to a different decision tree.

Denotes higher value/priority annotaTons. Denotes medium value/priority annotaTons. Denotes lower value/priority annotaTons.

Denotes drop down or annotaTon field examples.

Decision trees assist with data entry and annota,on of resources

The Ideal Scholarly Research Cycle

During the course of collecTng informaTon about research resources, which many laboratories were willing to share, we discovered that while larger core faciliTes rouTnely have resource and workflow organizaTon strategies, primary research labs very rarely do. This creates barriers to reproducing experiments as well as to publishing and sharing resources. Giving labs organizaTonal tools can help address these issues.

Provide scien,sts with the tools they need to record their resources during the course of research

How can we make this cycle more efficient?

o  Researchers produce data and resources that lead to publicaTons.

o  Published data informs researchers of new experimental designs.

o  InformaTon about researchers, resources, data, and published papers is stored in various public repositories.

The goal of eagle-‐i is to make scienTfic research resources more visible via a federated network of insTtuTonal repositories. Using an ontology-‐driven approach for biomedical resource annotaTon and discovery, the Network currently includes resources from 23 insTtuTons.

New ini,a,ves with eagle-‐i NCATS has funded two new projects that leverage eagle-‐i to further translaTonal science. The first project aims to expand the breadth, quality, and discoverability of data about people and resources by harmonizing the ontologies of VIVO, eagle-‐i, and ShareCenter (www.ctsaconnect.org). The second project aims to expand the eagle-‐i plakorm to new CTSA insTtuTons, and to publish resources as Linked Open Data.

BiocuraTon

Data collecTon

User interface design

Ontology development

CuraTon guidelines

SPARQL query tool for QA

Ontology Browser

SWEET Search applicaTon

Decision trees

Google code

The eagle-‐i workflow

Search applicaTon

AnnotaTon tool

InsTtuTonal repositories

Biocurator Ontology Reques

t new terms

Request resources

eagle-‐i parTcipaTng lab

Researcher

Resources and data

Researcher Publica,ons

Public repositories •  eagle-‐i •  MODs •  NIF •  Entrez Gene...

Public repositories •  PubMed •  Google Scholar •  Mendeley…

Professional networking: •  VIVO •  Harvard Profiles •  LinkedIn…

Major eagle-‐i resource types are shown as dark boxes. Persons and laboratories play a central role in eagle-‐i. Classes and properTes are reused from pre-‐exisTng ontologies or created de novo. Examples of some of the relaTons between the classes are indicated.

Research resources: curating the new eagle-i discovery system

Health & Medicine