Date post: | 26-Dec-2015 |
Category: |
Documents |
Upload: | leona-knight |
View: | 216 times |
Download: | 1 times |
An Introduction to Ontologies
Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland
Setting the stage
1. What is an ontology?2. Understanding ontologies3. Anatomy of an ontology class4. Upper ontologies5. How are ontologies used?
MouseEcotope GlyProt
DiabetInGene
GluChem
sphingolipid transporter
activity
Common controlled vocabularies indicate the same meaning under different annotation circumstances
Any closed, prescribed list of terms used for classifying data
What is a controlled vocabulary?
Key Features: Terms are not usually defined. Relationships between the terms are
not usually defined. Can be a list.
Here is a CV of wines:Pinot noir, red, chardonnay, Chianti, Bordeaux, Riesling….These are all different types- color, location, varietal, and are present in a list.
Another example would be the map locations list at the end of your Gazeteer.
Any controlled vocabulary that is arranged in a hierarchy
What is a Taxonomy?
Key Features:• Terms are not usually defined.• Relationships between the terms are not usually defined.• Terms are arranged in a hierarchy.
Here is a wine taxonomy:
WineRed
merlotzinfandelcabernetpinot noir
Whitechardonnaypinot grisRiesling
A taxonomy that contains additional information about use of the terms
What is a Thesaurus?
Key Features:• Terms are not usually defined.• Relationships between the terms are not usually defined.• Terms are arranged in a hierarchy.• Statements about the terms are included such as scope notes or instructions for use.
Some well known thesauri are:WordNet, NCI cancer thesaurus, MeSH
A formal conceptualization of a specified domain of interestWhat is an ontology?
Key Features:• Terms are defined.• Relationships between the terms are defined, allowing logical inference.• Terms are arranged in a hierarchy.• Expressed in a knowledge representation language such as RDFS, OBO, or OWL.
Some well known ontologies are:SnoMED, Foundational Model of Anatomy, Gene Ontology, Linnean Taxonomy of species
Reproduced with permission, Jason Freenyhttp://web.mac.com/moistproduction/flash/index.html
http://www.mkbergman.com/?m=20070516
The Ontology spectrum:
Bottom line: you get what you pay for.
OBO
Ontology Languages
Web Ontology Language (OWL)– Standard set of logical constructs for building an
ontology– Many syntaxes
• OWL-RDF/XML• OWL-XML• Manchester
– Many reasoners
OBO-Format– Current formalized by mapping to a subset of OWL
• can be treated as another OWL syntax
A common misconceptionAre ontologies about terms or things?
When you are arguing about including something in your ontology,
1. Are you arguing about what a term means?
2. Or are you arguing about what term should be adopted in your ontology language to represent a well-characterized entity or concept?
These are terminological questions, and not ontological questions. 1 is a purely linguistic dispute; 2 is primarily a practical question.
The ontological questions are:
What kind of things should we recognize in our ontology?(Never mind, for the moment, what we might choose to call them.)
What are their relations to one another?(Note: What are the relations of their terms/names to one another?)
Adapted from Gary Merrill
What matters are things
Data annotated to the right schema will be more consistent
DessertIce cream
gelatosno-conesoft-servecustard cream
Caketortedouble-layergaletteangel food cake
DES:000038DES:000034
DES:000456DES:005566DES:000564DES:000744
DES:0003221DES:000667DES:000975DES:000544DES:000765
DES:000034 - Frozen milk and/or cream with various sugars, and flavoring such as fresh fruit and nut purees.[1]
Labels are handles, not things
Why build an ontology? A simple example
Number of genes annotated to each of the following brain parts in an ontology:
brain 20part_of hindbrain 15part_of rhombomere 10
Query brain without ontology 20Query brain with ontology 45
Ontologies can facilitate grouping and retrieval of data
A
C
B
D EVertebrata
Ascidians
Arthropoda
Annelida
Mollusca
Echinodermata
tetrapod limbs
ampullae
tube feet
parapodia
Querying for genes in similar structures across species
Panganiban et al., PNAS, 1997
Distal-less orthologs participate in distal-proximal pattern formation and appendage morphogenesis
mouse limbsea urchin tube feet
ascidian ampulla polychaete parapodia
is_a
entity
organism
cat
mammal
animalis_a
is_a
is_ahuman
is_a
instance_of instance_of
Peanut Chris Shaffer
Types, subtypes, and instances
Subtypingrelation
is_a = SubClassOf
How do you tell if it is an instance or a class?Is there more than one in existence? Is the entity referencing a group of things with common properties?
Class or instance?There is only one Snoopy
There is a class of things labeled “Snoopy toys”
Class or instance? Class or instance?
There is only one Alaska
There is a class of things labeled “States”
There is only one blue morpho in my specimen collection
There is a class of things labeled “Blue morpho butterflies”
General Principle for Logical DefinitionsDefinitions are of the following Genus-Differentia form:
X = a Y which has one or more differentiating characteristics.
where X is the is_a parent of Y.
Definition: Blue cylinder = Cylinder that has color blue.
Definition: cylinder = Surface formed by the set of lines perpendicular to a plane, which pass through a given circle in that plane. is_a is_a
Definition: Red cylinder = Cylinder that has color red.
The True Path Rule
cuticle synthesis--[i] chitin metabolismcell wall biosynthesis--[i] chitin metabolism----[i] chitin biosynthesis----[i] chitin catabolism
chitin metabolism--[i] chitin biosynthesis--[i] chitin catabolism--[i] cuticle chitin metabolism----[i] cuticle chitin biosynthesis----[i] cuticle chitin catabolism--[i] cell wall chitin metabolism----[i] cell wall chitin biosynthesis----[i] cell wall chitin catabolism
GO Before: GO After:
BUT: A fly chitin synthase gene could be annotated to chitin biosynthesis, and appear in a query for genes annotated to cell wall biosynthesis (and its children), which makes no sense because flies don't have cell walls.
NOW: all the subClass terms can be followed up to chitin metabolism, but cuticle chitin metabolism terms do not trace back to cell wall terms, so all the paths are true.
The pathway from a subClass all the way up to its top level parent(s) must be universally true.
Where does the True Path Rule come from?Transitivity. Some relations are transitive, and apply across all levels of the hierarchy.
For example, a cat is_a mammal, and a mammal is_a vertebrateSOa cat is_a vertebrate
=> This is the true path rule and is because the is_a relation is transitive.
Some properties are not transitive.
For example, head has_quality round.and,
head part_of organism. So is the organism round? Of course not!
BUT, eyes are part_of head, and head part_of organism, SO eye part_of organism is true, because part_of is a tranistive relation.
Relations are logically defined in a common relation ontology or within each ontology that uses them.
≠>
Relationships and definitions
A relationship from one class to another is a formalized part of its definition (an object property in OWL)
A subtype relation (is_a in OBO, SubClassOf in OWL) specifies necessary conditions for membership of a class.
For example, finger part_of hand (all finger part_of some hand) states that a necessary condition of being in the class finger is to be part of some hand.
So… if a finger exists, it is part of some hand. But…this does not mean that if a hand exists, it has as a part a finger.
DirectionalityOften, the order of the terms in an assertion will matter:We can assert:
adult transformation_of child
but notchild transformation_of adult
More about using ontology classes and relations
Universality Entities should have the same meanings on every occasion of use. (= They should refer to the same universal types)
Basic ontological relations such as SubClassOf and part_of should be used in the same way by all ontologiesFor example, if you need a non-transitive part relation, then define a new relation for this purpose.
Meissirel C et al. PNAS 1997;94:5900-5905
©1997 by National Academy of Sciences
Symmetric
For example:
Retinal ganglion cell connected_to lateral geniculate nucleus
AND
Lateral geniculate nucleus connected_to Retinal ganglion cell
Some relations are Symmetric.
If A is connected_to B, then B is connected_to A.
muscle
anatomical structure
lumen
anatomical space
lumen of gut
anatomical space
anatomical structure
✗
Some classes are declared to never share any instances in common
OWL DisjointWith OBO: disjoint_from
NO!
About reasonersA piece of software able to infer logical consequences from a set of asserted facts or axioms.
They are used to check the logical consistency of the ontologies and to extend the ontologies with "inferred" facts or axioms
For example, a reasoner would infer:
Major premise: All mortals die.Minor premise: Some men are mortals.Conclusion: Some men die.
Different reasoners can perform slightly differently. There are a number of reasoners to be aware of:ELK, HerMIT, Pellet, etc. We’ll use these later in the protégé tutorial.
Different kinds of definitions An ontology is a collection of axioms
An axiom is simply a sentence or a statement Axioms can be
non-logical (aka “annotations” or “text definitions”)
• E.G. GO_0000262 has synonym ‘mtDNA’
• opaque to reasoners logical
• well-defined semantics
• understood by reasoners
• Example: SubClassOf axioms
• Arguments can be classes or class expressions (for example, the class of things that are parts of the leg)
Anatomy ontologies: Exemplar use case
1. What makes anatomy ontologies different?
2. What kinds of anatomy ontologies exist?
3. Anatomy of an anatomy ontology class4. Upper ontologies5. How are anatomy ontologies used?
There are many useful ways to classify parts of organisms:
its parts and their arrangement its relation to other structures
what is it: part of; connected to; adjacent to, overlapping?
its shape its function its developmental origins its species or clade its evolutionary history
Cajal 1915, “Accept the view that nothing in nature is useless, even from the human point of view.”
Not all classification is useful
Be practical: Build ontologies for what you need and for what can be reused
About thirty years ago there was much talk that geologists ought only to observe and not theorise; and I well remember some one saying that at this rate a man might as well go into a gravel-pit and count the pebbles and describe the colours.
C. Darwin
Classifying anatomy
appendage
antenna forewing
wing
hindwing
Relationships record classifications too
leg
part_of some ‘thoracic segment
wing
‘leg’ SubClassOf part_of some thoracic segment
The knowledge in an ontology can make the reasons for classification explicit
Any sense organ that functions in the detection of smell is an olfactory sense organ
sense organcapable_of some detection of smell
olfactory sense organ
nose
sense organ
nose
capable_of some detection of smell
Classifying
sense organcapable_of some detection of smell
olfactory sense organ
nose
=> These are necessary and sufficient conditions, also called an equivalent class axiom
It is difficult to keep track of multiple classification chains and: • ensure completeness• avoid redundancy• avoid true path rule violation
Why create class equivalent classes and class restrictions?
Compositionality and avoiding asserted multiple inheritance
We can logically define composed classes and create complex definitions from simpler ones
aka: building blocks, cross-products, logical definitionsDescriptions can be composed at any time
Ontology construction time (pre-composition) Annotation time (post-composition)
Formal necessary and sufficient definitions + a reasoner
Automatic (and therefore manageable) classification Requires subtype classification, so apart from the root
term(s), no term should lack a superclass parent.
Let the reasoner do the work!
post-composed versus pre-composed anatomical entities are logically equivalent
Plasma membrane of spermatocyte• Plasma membrane [GO CC]• Spermatocyte [Cell Ontology]
plasma membrane and part_of some spermatocyte
Gene Ontology Relation Ontology Cell Ontology
Genus Differentia
What kinds of anatomy ontologies exist?Mouse
MA (adult) EMAP / EMAPA (embryonic)
Human FMA (adult) EHDAA2 (CS1-CS20)
Amphibian AAO XAO
Fish ZFA (zebrafish) MFO (medaka) TAO (teleosts)
Nematode WBbt (c elegans)
Arthropod FBbt (Drosophila) TGMA (Mosquito) HAO (hymenoptera) Arthropod anatomy ontology
Plant ontology
Species-centric and multi-species ontologies
Species neutral ontologies
CARO (common anatomy reference ontology)
Uberon (cross-species anatomy)vHOG (vertebrate homologous organs)
CL (cell ontology)
GO (gene ontology)
Phenotype ontologies
MP mammalian phenotype
HP human phenotype
WB worm phenotype
chemical entities
Many perspectives, many ontologies – that overlap in content
grossanatomy
tissues
cellscellanatomy
proteins
phenotypes
clinical disorders
processes
physiological processes
development
reactions
cellular processes
behavior
evolutionary characters
nervous systemneural crest
Domain ontologies are organized according to upper ontologies (Basic Formal Ontology) that specify the general types of things that exist
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity
(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic Quality(PaTO)
Biological Process(GO)
CELL AND CELLULAR COMPONENT
Cell(CL)
Cellular Component(FMA, GO)
Cellular Function(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process(GO)
=> Classification according to these higher level types helps ensure the True Path Rule holds
BFO high level classes
Continuant: An entity that exists in full at any time in which it exists at all, persists through time while maintaining its identity and has no temporal parts.
Independent continuant: A continuant that is a bearer of quality and realizable entities, in which other entities inhere and which itself cannot inhere in anything.
Dependent continuant: A continuant that is either dependent on one or other independent continuant bearers or inheres in or is borne by other entities.
Occurrent: An entity that has temporal parts and that happens, unfolds or develops through time. Sometimes also called perdurants.
The Common Anatomy Reference OntologyCARO is a structural classification based on
granularity
From the bottom up:Cell componentCellPortion of tissueMulti-tissue structure
From the top down:Organism subdivisionAnatomical system
Acellular structures
CARO is an upper reference ontology that can be used to structure new anatomy ontologies
Using CARO as a templateis_a
zebrafish AO
CARO
Cell and cell component are cross-referenced to GO
Zebrafish classes are asserted to be subclasses of CARO classes
Example of complexity arising from multiple species-contexts
erythrocyte
cell
nucleate cell enucleate cell
not applicable in all contexts
Example of complexity arising from multiple species-contexts
erythrocyte
nucleate erythrocyte
enucleate erythrocyte
cell
nucleate cell enucleate cell
zebrafish nucleate
erythrocyte
human erythrocyteZFA:0009256
… …
CL:0000562
CL:0000232
CL:0000592
FMA:81100
species ontologiesattached at appropriatelevel
Developmental Biology, Scott Gilbert, 6th ed.
Using reasoners to detect errors
Fruit fly FBbt ‘tibia’ Human FMA ‘tibia’
UBERON: tibia
UBERON: bone
is_a
is_a
is_a
Vertebrata
Drosophila melanogaster
part_of
Homo sapiens
is_a
only_in_taxon
part_of
disjoint_with
✗
Representing different levels of granularity
lateral line development
?
?
GO:cilium part_of CL:hair cell part_of CL:neuromast
CL:hair cell part_of CL:neuromast
CL:neuromast part_of ZFA:lateral line
GO
cilium development
hair cell development
neuromast development
Ontology considerations An ontology is a classification There are many useful ways to classify anatomy Maintaining multiple classification schemes by hand
is impracticalSo you should automate it.
Everybody makes mistakesSo you should let the computer find errors for you
Re-use other people’s work where possibleImport class hierarchiesUse common patterns
Cautionary note – formal languages have limitations. Don’t expect to be able to express everything!