+ All Categories
Home > Documents > 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science,...

1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science,...

Date post: 15-Jan-2016
Category:
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
58
1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken
Transcript
Page 1: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

1

Pax TerminologicaBarry Smith

Institute for Formal Ontology and Medical Information Science, Saarland University,

Saarbrücken

Page 2: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

2

Overview systems for semantic annotation linguistics vs. science semantic annotation in biomedical

informatics improving systems for semantic

annotation conclusions

Page 3: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

3

The Penn Treebank Project annotates naturally occurring text for

linguistic structure, producing skeletal parses showing syntactic and semantic information in tree form

Page 4: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

4

Automatic Content Extraction Program (ACE)

develops text corpora in English, Chinese and Arabic annotated for entities, the relations among them and the events in which they participate.

Page 5: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

5

High Accuracy Retrieval from Documents (HARD)

creates corpora and annotations including topics, metadata and relevance judgements

Page 6: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

6

Annotation Graph Toolkit (AGTK)

formal framework for representing linguistic annotations of time series data.

Page 7: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

7

TimeMLrobust specification language for markup of natural language to support:

time stamping of events (identifying and anchoring in time);

ordering events with respect to one another

reasoning about persistence

Page 8: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

8

SpaceMLprovides facilities for annotating category attributions to spatial regions

(self-connected, bounded, regular, etc.) ascription to regions of topological,

distance, morphological and orientation relations;

the definition of a region in terms of its boundary.

Page 9: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

9

WordNet annotates English nouns, verbs,

adjectives and adverbs to synonym sets, each representing one underlying lexical concept.

Page 10: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

10

FrameNet documents the range of semantic and

syntactic combinatory possibilities (valences) of each word in each of its senses

Page 11: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

11

is there order in this chaos?

Page 12: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

12

ISO/TC 37 / SC 4 N 076 Ide, N., Romary, L., de la Clergerie, E.

(2003). International Standard for a Linguistic Annotation Framework.

HLT-NAACL 2003 (Edmonton)

Page 13: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

13

OntoGloss (influenced by ISO Linguistic Annotation Framework)

an ontology based annotation tool that uses pre–defined terms in an ontology to mark-up a document

No standard portal for semantic annotation tools/projects (?)

Page 14: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

14

Purposes of semantic annotation

information retrieval (incl. semantic indexing = answering queries that use words not used in the text, including words from other languages)

automatic translation disambiguation topic extraction and text summarization information integration reasoning

Page 15: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

15

for linguistics fiction no less important than fact English has no privileged status regimentation not allowed annotation frameworks may be

competitive cross-framework consistency is not

important

Page 16: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

16

for science factual discourse alone important English is language par excellence regimentation is allowed goal of truth: to create a single computer-

processable map of reality truth is one must strive for consistency

of annotations and additivity of annotation frameworks

Page 17: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

17

for science must end the terminology wars Plant Ontology (PO)

cell =def. structural and physiological unit of a plant

what should PO do when it needs to study bacteria in plants?

answer: all shall use the word ‘cell’ to mean the same thing!

(all = in biology)

Page 18: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

18

the ideal (of additivity) WordNet for single word forms FrameNet for valencies/combination forms SpaceNet for spatial structures TimeNet for temporal structures ChemNet for chemical structures CellNet for cellular structures

etc.

Page 19: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

19

a scientific problem: huge swarms of biomedical data at different granularities, from molecule to clinic

methods for data integration needed to enable reasoning across data at multiple granularities

(genomic medicine ...)

Page 20: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

20

orthodox solutions to this problem

dumb statistical number-crunching

or: Semantic Web, Unified Medical

Language System (UMLS), Moby, etc. let a million flowers bloom and rely on mappings between already

existing controlled vocabularies/annotation systems

Page 21: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

21

an alternative solution use the peer-reviewed biomedical

literature contains both textual descriptions of

biological functions (incl. diseases) and references to entities represented in the biochemical databases

use high-quality semantic annotations of the former to integrate across the latter the Gene Ontology

Page 22: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

22

Page 23: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

23

Page 24: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

24

The methodology of annotations Model organism databases employ scientific

curators, who use the experimental observations reported in the biomedical literature to link gene products (such as proteins) with GO terms in annotations.

Page 25: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

25

The process of annotations leads to improvements and extensions of the

ontology, which in turn leads to better annotations a virtuous cycle of improvement in the quality and

reach of both future annotations and the ontology itself,

yielding a slowly growing computer-interpretable map of biological reality within which major databases are automatically integrated in semantically searchable form

Page 26: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

26

need to extend GO by means of other ontologies, e.g. Cell Ontology, via integrated definitions

id: CL:0000062name: osteoblastdef: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629]is_a: CL:0000055relationship: develops_from CL:0000008relationship: develops_from CL:0000375

GO

Cell type

New Definition

+

=Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.

Page 27: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

27

need to extend GO also to semantic annotation of clinical literature

unfortunately, available (UMLS) clinical vocabularies are of variable quality and low mutual consistency

Page 28: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

28

need for prospective standards to assure consistency and high quality

create rules for high-quality controlled vocabularies for the annotation of scientific literature

make everyone follow these rules regimentation !

Page 29: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

29

a shared portal for (so far) 58 ontologies (low regimentation)

http://obo.sourceforge.net

first step

Page 30: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

30

Page 31: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

31

Second step: Second step: The OBO The OBO FoundryFoundryhttp://obofoundry.org/http://obofoundry.org/

Page 32: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

32

scientific standards and principles-based coordination of systems for semantic annotation of biomedical literature to create a single interoperable family of gold standard reference ontologies

The OBO FoundryThe OBO Foundry

Page 33: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

33

A subset of OBO ontologies, whose developers have agreed in advance to accept a common set of principles designed to ensure – formal robustness – stability– compatibility– interoperability – support for logic-based reasoning

The OBO FoundryThe OBO Foundry

Page 34: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

34

– Custodians• Michael Ashburner (Cambridge)• Suzanna Lewis (Berkeley)• Barry Smith (Buffalo/Saarbrücken)

The OBO FoundryThe OBO Foundry

Page 35: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

35

A prospective standard

designed to guarantee interoperability of ontologies from the very start

established March 2006; already 13 OBO ontologies have joined the Foundry and are being corresponding reformed; three new ontologies are being constructed ab initio in its terms

The OBO FoundryThe OBO Foundry

Page 36: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

36

Initial Candidate Members– GO Gene Ontology– CL Cell Ontology– SO Sequence Ontology– ChEBI Chemical Ontology – PATO Phenotype (Quality) Ontology– FuGO Functional Genomics Investigation Ontology– FMA Foundational Model of Anatomy– RO Relation Ontology– ChEBI Chemical Entities of Biological Interest – CARO Common Anatomy Reference Ontology – FuGO Functional Genomics Investigation Ontology – PrO Protein Ontology – RnaO RNA Ontology  

The OBO FoundryThe OBO Foundry

Page 37: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

37

Under development – Disease Ontology– Mammalian Phenotype Ontology – OBO-UBO / Ontology of Biomedical Reality – Organism (Species) Ontology– Plant Trait Ontology– Environment Ontology– Behavior Ontology– Biomedical Image Ontology– Clinical Trial Ontology

The OBO FoundryThe OBO Foundry

Page 38: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

38

Page 39: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

39

CRITERIA

The OBO FoundryThe OBO FoundryThe OBO FoundryThe OBO Foundry

• The ontology is open and available to be used by all.

• The ontology is in, or can be instantiated in, a common formal language.

• The developers of the ontology agree in advance to collaborate with developers of other OBO Foundry ontology where domains overlap.

Page 40: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

40

• The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement.

• They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary

The OBO FoundryThe OBO Foundry

CRITERIA

Page 41: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

41

• The ontology possesses a unique identifier space within OBO.

• The ontology provider has procedures for identifying distinct successive versions.

• The ontology includes textual definitions for all terms.

CRITERIA

The OBO FoundryThe OBO Foundry

Page 42: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

42

• The ontology has a clearly specified and clearly delineated content.

• The ontology is well-documented.

• The ontology has a plurality of independent users.

CRITERIA

The OBO FoundryThe OBO Foundry

Page 43: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

43

• The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.*

*Genome Biology 2005, 6:R46

CRITERIA

The OBO FoundryThe OBO Foundry

Page 44: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

44

OBO Relation Ontology

Foundational is_apart_of

Spatial located_incontained_inadjacent_to

Temporal transformation_ofderives_frompreceded_by

Participation has_participanthas_agent

Page 45: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

45

analogy with FrameNet

• the constituent ontologies in the OBO Foundry are focused overwhelmingly on single nouns

• the OBO Relation Ontology is designed to ensure a common structure of relations shared by all Foundry ontologies – comparable to SpaceML, TimeML ...

• need something like (Bio)FrameNet to pull the different levels of granularity together

Page 46: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

46

CRITERIA

• Further criteria will be added over time in order to bring about a gradual improvement in the quality of the ontologies in the Foundry

The OBO FoundryThe OBO FoundryThe OBO FoundryThe OBO Foundry

Page 47: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

47

GOALS

• semantic alignment of OBO Foundry ontologies through a common system of formally defined relations

• to enable reasoning both within and across ontologies, and thus also within and between the literature annotated in its terms

• and thus also to support reasoning across associated data

The OBO FoundryThe OBO Foundry

Page 48: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

48

GOALS• to promote re-usability of data • if data-schemas are formulated using a

single well-integrated framework for semantic annotation in widespread use, then this data will be to this degree itself become more widely accessible and usable

The OBO FoundryThe OBO Foundry

Page 49: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

49

GOALS

• to help in creating better mappings e.g. between human and model organism phenotypes:S Zhang, O Bodenreider, “Alignment of Multiple Ontologies of Anatomy: Deriving Indirect Mappings from Direct Mappings to a Reference Ontology”, AMIA 2005

The OBO FoundryThe OBO Foundry

Page 50: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

50

• to introduce the scientific method into the development of semantic annotation frameworks

• to introduce some of the features of scientific peer review into biomedical ontology development

The OBO FoundryThe OBO Foundry

GOALS

Page 51: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

51

• to aid literature search:

http://www.gopubmed.org/

• to subvert the current policy of ad hoc creation of new annotation schemas by each clinical research group by providing a common shared framework

The OBO FoundryThe OBO Foundry

GOALS

Page 52: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

52

• to use the Foundry ontologies as benchmark for improving existing terminologies

• to create controlled vocabularies for semantic annotation of clinical trial records, scientific journal articles, ...

The OBO FoundryThe OBO Foundry

GOALS

Page 53: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

53

• to create an evolving map-like computable representation of the entire domain of biomedical reality

• to create the conditions for a step-by-step evolution towards high quality ontologies in the biomedical domain

• which will serve as stable attractors for clinical and biomedical researchers in the future

The OBO FoundryThe OBO Foundry

GOALS

Page 54: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

54

GOALS

• to end the terminology wars; and to advance regimentation of clinical and other vocabularies in a scientific spirit

The OBO FoundryThe OBO Foundry

Page 55: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

55

Conclusion 1 existing linguistic resources for semantic

annotation are scattered to the four winds need for something like the OBO

Library to ensure that the different available tools are available for comparison and alignment

Page 56: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

56

Conclusion 2 linguists developing tools for semantic

annotation with scientific purposes need something like the Foundry to ensure a complete set of interoperable tools which allow for additivity of annotations

Page 57: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

57

the ideal BioWordNet for single word forms SpaceNet for spatial structures TimeNet for temporal structures ChemNet for chemical structures CellNet for cellular structures BioFrameNet for valencies/combination

forms

Page 58: 1 Pax Terminologica Barry Smith Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken.

58


Recommended