Annotating Experimental Records using Ontologies

Post on 25-Feb-2016

36 views 0 download

Tags:

description

Annotating Experimental Records using Ontologies. Olga Giraldo , Unal de Colombia/CIAT Jael Garcia, 3 Universität der Bundeswehr Alexander Garcia, UAMS. Motivation and Research Question. Knowledge -based approach to managing laboratory information - PowerPoint PPT Presentation

transcript

Annotating Experimental Records using Ontologies

Olga Giraldo, Unal de Colombia/CIATJael Garcia, 3Universität der Bundeswehr

Alexander Garcia, UAMS

Motivation and Research Question

• Knowledge-based approach to managing laboratory information– it combines elements from the Semantic Web (SW), e.g.

ontologies supporting organization and classification, with elements from Social Tagging Systems, e.g. collaboration, ad-hoc organization strategies.

• How can we semantically annotate laboratory records?

• How can we facilitate the coexistence of laboratory notebooks and electronic laboratory records?

Motivation and Research Question• Easy to use, highly portable,

easy to share, low cost…• Great artifacts for supporting

design• Legal requirement

da Vinci

Mutis

Marie Curie

Research Question

• How can we facilitate the coexistence of laboratory notebooks and electronic laboratory records?

• How can we semantically annotate laboratory records?

Our Approach

• Documents should be able to “know about” their own content for automated processes to “know what to do” with them.

Semantics….

Materials and Methods

• Our scenario: supporting the annotation of experimental data for some of the processes routinely run at the Center for International Tropical Agriculture (CIAT) biotechnology laboratory

• 15 laboratory notebooks together with their corresponding electronic records, e.g. XLS files, outputs from lab equipment, etc.

• 10 biologists • Direct non-intrusive

observation: 6 months• Ontology and prototype

development: iterative and collaborative process

• Existing ontologies

Results

• Data types • Rhetorical structure• Ontologies• Orchestration of ontologies• Tags and ontologies • Lessons

Results

• Data Types– Manuscript – Digital – Digital data with manuscript annotations

Results• Manuscript

– Lists– To-dos – How-tos (protocols)– Incomplete results – Dates– Formulas– Electronic paths – Sources for information

(URLs)

Results

• Digital– Photos– Lists– Incomplete results – Protocols – Figures– Sequences

Results

• Digital + Manuscript – Digital files, print-outs,

tagged with manuscript information.

Results

• We identified the rhetorical structure implicit in those laboratory notebooks we studied

• And the metadata describing such structure

Lab Notebook

Body: metadata describing an experimental activity

Header: metadata describing a lab notebook

Title (DC)

Notes (AgMes)

Date of creation (DC) Laboratory

notebook number (M4L)

Creator (DC/AgMes)

Date of finalization (M4L)

Languaje (DC)

Project (OBI/AGROVO

C)

Laboratory procedure

(M4L)

Comments (BioPortal,

NCIt, SNOMED)

Date (DC)

Page number (M4L)

Purpose (M4L)

Security measurements

(M4L)

Outcome (NCIt)

Rhetorical structure: Header, Body.

Materials & Methods, experimental design

Materials & Methods: Samples, Reagents, Assays, Equipment and supplies.

Experimental design

Samples: DNA, RNA, whole plant, etc. (OBI, CHEBI, PO)

Reagents: buffer, dNTP mix (CHEBI, M4L)

Assay: extraction DNA, PCR, gel electrophoresis (OBI, M4L).

Equipment & supplies: freezer, centrifuge, shaker, glove, etc. (OBI, PEO, SEP, SNOMED, BIRNLex M4L).

Experimental design: (OBI, M4L)

Protocol (OBI)

Recorded by (M4L)

We focused on: DNA extraction, PCR and Electrophoresis

DNA Extraction

A typical process in a plant biotechnology laboratoryMechanical pulverization of plant material

Results

• M4L: our ontology for the experimental processes we studied– Based on OBI. – Terms proposed to OBI: 197, including new terms

plus terms from other ontologies– Other terms will be proposed to other ontologies,

e.g. ChEBI, GO, PO

Ontology N. of concepts

0 Metadata for Laboratory Notebook (M4L) 149

1 Chemical Entities of Biological Interest (CHEBI) (Degtyarenko et al., 2008) 87

2 Ontology for Biomedical Investigation (OBI) (Brinkman et al., 2010) 59

3 Medical Subject Headings ontology (MSH) (Moerchen et al., 2008) 17

4 Gene Ontology (GO) (Ashburner et al., 2000) 14

5 Sample Processing and Separation Techniques (SEP) (http://psidev.info/index.php?q=node/312) 6

6 BIRN Project lexicon (BIRNLex) (Bug et al., 2008) 6

7 Gene Regulation Ontology (GRO) (Beisswanger et al., 2008) 5

8 National Cancer Institute thesaurus (NCIt) (Ceusters et al., 2005) 5

9 Plant Ontology Consortium (POC) (Jalswal et al., 2005) 5

10 SNOMED-CT (http://www.nlm.nih.gov/research/umls/Snomed/snomed_main.html) 5

11 BioTop Ontology (Beisswanger et al., 2007) 1

12 Foundational Model of Anatomy (FMA) (Rosse and Mejino, 2003) 1

13 Ontology for Genetic Interval (OGI) (Lin et al., 2010) 1

14 Parasite Experiment Ontology (PEO) (http://wiki.knoesis.org/index.php/Parasite_Experiment_ontology) 1

15 Proteomics Data and Process Provenance (PDPP) (Sahoo et al., 2006) 1

Results• We have structured the

descriptive layers by reusing and extending existing ontologies.

• For supporting the annotation within our scenario we have identified three main layers, namely:– i) that related to the document

itself,– ii) the annotation layer, and– iii) that related to the

experiment.

Results

• Orchestration of ontologies: Annotation Ontology

The Annotation Ontology is a vocabulary for performing several types of annotation - comment, entities annotation (or semantic tags), textual annotation (classic tags), notes, examples, erratum... - on any kind of electronic document (text, images, audio, tables...) and document parts. AO is not providing any domain ontology but it is fostering the reuse of the existing ones for not breaking the principle of scalability of the Semantic Web.

InitEndCornerSelectorImageSelector

(304,507) (360,618)

ANNOT1

ANNOT2

Annotation Qualifier

Definition

aos:init aos:end

rdf:type

rdfs:SubClassOf

Selector

ao:context

rdf:typerdfs:SubClassOf

Provenance

http://www.tags4lab.org/

foaf.rdf#olga.giraldoJune 1, 2010

foaf:Person

rdf:type

pav:createdOnpav:createdBy

Annotation

rdf:typerdfs:SubClassOf

Partial sequence on psy promoter

aof:annotatesDocument

aof:onDocument

GenBank:AB005238ao:hasTopic

name

Topic

ann:body

moat:Tag

tags:name

rdf:type

moat:tagMeaning

MOAT

aoex:hasMoatMeaning moat:Meaning

rdf:typemoat:hasMeaning

aof:annotatesDocument

http://www.ncbi.nlm.nih.gov/

pubmed/12520345

Results

• The AO is structuring the semantic annotation as well as the tags generated by users. – In this way we are

supporting complex SPARQL queries involving several ontologies, for instance:

• Retrieve from the eLabBook the pages tagged by Tim Andrews or Lisa Watson with the tags rice and iron for which there is a LIMS data entry”

Concluding Remarks

• Although several ELNs have been proposed and replacing paper-based records has been a consistent trend for several years, the technology has not yet been widely adopted; Laboratory Information Management Systems (LIMS) in combination with paper-based laboratory notebooks continue to be commonly used; particularly in academic environments.

Concluding Remarks

• Sharing and organizing information happens on a concept basis – researchers studying genes involved in iron

transport share information with those who undertake nutritional studies assessing the effects of iron intake in human populations

– Clustering information based on concepts

Concluding Remarks

• Simple tagging mechanisms proved to be valuable resources for organizing information– Cloud of tags were used as TOCs– Tags were also used to support a quick view of

laboratory pages – Tags tend to stabilize over time– Tags were a valuable resource of terms and

evidence (use cases) for those terms

Concluding Remarks

• Time is difficult to model • Incremental prototyping and participatory

design were key –community engagement• Limitations in the technology:

– Tablets, electronic pen, ipad first generation, now motorola XOOM

– Browser compatibility• Laboratory notebooks look like specialized wikis

Future Work

• Focus on one technology: Android OS• Semantic LIMS• Support the whole cycle (LIMS record—notebook—

machine generated data)• Automatic annotation of machine generated data• Adopt minimal amounts of information• Adopt techniques from Personal Information

Management approaches• Look more like a wiki

Acknowledgments

• John Bateman, Oscar Corcho, Joe Tohme, Cesar Montana, Alberto Labarga

• The CIAT biotech lab