Ontologies
Ontologies
German Rigau i [email protected]
IXA groupDepartamento de Lenguajes y Sistemas Informáticos
UPV/EHU
Ontologies
Introduction Mikrokosmos SUMO Cyc, OpenCyc
OntologiesOutline
Ontologies
An ontology is an explicit specification of a conceptualization (Gruber 93)
A conceptualization is an abstract, simplified view of the world represented for some purpose
An ontology is a description (formal specification) of a set of concepts and relationships for enabling knowledge sharing and reuse (to perform logical commintments)
An ontology commintment is an agreement to use a vocabulary in a way that is consistent with respect to the theory specified by the ontology
OntologiesWhat is an Ontology?
Ontologies
• “A specific artifact designed with the purpose of expressing the intended meaning of a (shared) vocabulary” (Guarino 03)
• “In philosophy, ontology (from the Greek ω ν = being and λογ οσ = word/speech) is the most fundamental branch of metaphysics. It is the study of being or existence as well as the basic categories thereof — trying to find out what entities and what types of entities exist. Ontology has strong implications for the conceptions of reality.” (from Wikipedia)
• “Ontology” dates to 17th century; meta-physics back to Aristotle
OntologiesWhat is an Ontology?
Ontologies
Ontologies is a BIG topic! Main focus on NLP and NLU Learn the basics from the experts:
Gruber papers: Various papers on the web
Sowa book: Knowledge Representation
Guarino tutorial: Ontology-driven conceptual modeling and various papers
Hovy tutorial!
OntologiesDisclaimer
Ontologies
You need an unambiguous set of symbols for semantic representations (eat John tiramisu): which eat? which John?
You need to organize your symbols and/or variables according to the way they are processed nouns act differently from verbs in general
Can you do without an ontology? Of course you can: most of (today’s) statistical NLP
Our definition for this course: An ontology is a data structure in which symbols that represent conceptualizations are defined and manipulated by (NLP) software.
OntologiesWhy use an Ontology?
Ontologies
Concepts: represent a conceptualization; the class of all the examples of that event or entity
Hapiness, children
Relations: represent a relationship between concepts Colour-of, location-of
Axioms: express a necessary fact holding between concepts and relationships
If X is mortal then X will die one day
OntologiesWhat is inside an Ontology?
Instances: represent a specific individual Albert Einstein
T-Box
A-Box
…but what about Beethoven’s 9th symphony?
Ontologies
List terms that denote the entities, events, qualities, relationships, etc. in the domain
Link them using one or more relations:
structuring relations (subsumption, others)
definitional relations
additional info relations
Define axioms and properties
rules that specify what must be true about what
Provide additional information resources:
lexicons, glossaries, documentation, etc.
OntologiesContent building steps (1)
Terminology ‘ontology’ (e.g., WN)
‘True’ ontology
External resources
Ontologies
Find a primitive concept (e.g. human) Specialize it in various ways by adding various differentiae Ex: man, woman (:sex), adult, child (:age)
Define these differentiae elsewhere in the ontology Don’t confuse definitional aspects with mere properties! An apple is-a fruit with essential differentium XXX and with properties
:colour=red, :size=tennis-ball-sized…
Problems: What are the differentiae?
How do you order them?
OntologiesContent building steps (2)
human
adult
man boy
child
woman girl
:age
:sex
human
male-person
female-person
man boy woman girl:age
:sex
Ontologies
Functional purpose of classes: “provide maximum information with the least cognitive effort”
Established experimental paradigms for determining how good an example of a category a member is judged to be
Basic Level categories: A basic category is the largest class of which we can form a fairly
concrete image, like chair or ball. These are the first classifications that children make
Superordinate categories are collections of basic categories: furniture includes chairs, lamps, desks, beds, etc.; toys include balls, dolls, furry animals. No one object clearly represents them
Subordinate categories represent divisions of basic classes (deck chairs, bar stools, teddy bears, school desks)
OntologiesThe prototypes (1)
Ontologies
people categorize using the common features of the members (differentiae)
observations: (1) When people categorize, they cannot tell you what
features they are using — often don’t know the differentiae!
(2) When people categorize, they usually find some members of categories more “typical” (“better”) than others (e.g., a robin is a better member of the category Bird than an ostrich)
(3) When people categorize, they categorize more typical members more quickly than less typical ones
suggestion: Create ‘star structure’ of prototypes rather than (or in
addition to?) a subsumption hierarchy with differentiae
OntologiesThe prototypes (2)
Ontologies
Base Concepts BC introduced in EuroWordNet. The BC are supposed to be the concepts that play
the most important role in different languages. Two main criteria: A high position in the semantic hierarchy (abstract)
Having many relations to other concepts (hub)
Basic Level Concepts BLC are the result of a compromise between two conflicting principles of characterization: Represent as many concepts as possible (abstract)
Represent as many distinctive features as possible (concrete)
BC <> BLC
OntologiesThe prototypes (3)
Ontologies
Computational / expert systems: Protégé Ontologies Library: Stanford University’s collection of
18 influential ontologies (http://protege.stanford.edu/ontologies/ontologies.html)
OntoSelect: over 700 ontologies in various domains (http://views.dfki.de/Ontologies/)
Medical: UMLS: Metathesaurus (over 1 mill biomedical concepts and 5 mill
concept names from over 100 controlled vocabularies and classifications (some in multiple languages) used in patient records, administrative health data, bibliographic and full-text databases, expert systems), the Semantic Network (isa for type hierarchy; physically related, spatially related, temporally related, functionally related, conceptually related), and the SPECIALIST lexicon (http://www.nlm.nih.gov/research/umls/)
Industrial etc.: NAICS (North American Industry Classification System):
numerical classifications of construction, agriculture, technology, wholesale, retail, industry, etc., (http://www.census.gov/epcd/www/naics.html)
OntologiesDomain ontologies
Entity
Actor BusinessObject
BusinessProcessISA
person
employee
ISA
Lab
Procurement
Luigi Bianchi
Mario Rossi
LEKS
purchasingX
Conceptual Knowledge
Metodological Knowledge(modeling ideas: metatypes)
...
ActivityAction
purchasingY
...
IDEA
instantiation
instantiationFactual Knowledge
Ontology languages andmodels (OWL, KIF)
Specific ontologies: classes and properties/relations (SUMO, TCO)
Description of specific data: Individuals and their relations
(conceptual model: types)
(factual model: objects)
OntologiesLevels of Knowledge
Conceptual Knowledge (KR): Information to understand and process semantics:
Knowledge, such as: an Hotel is composed by: a reception, some rooms, etc…
Factual Knowledge (FR): Information on the content of the concepts.
Data, such as: the Holyday Inn Hotel has 250 rooms, the prices are…
OntologiesConceptual and Factual Knowledge
Ontologies
OntologiesWhat kind of Ontologies?
ConjunctionDisjunctionNegationChoice between instancesUniversal quantifierExistential quantifierCardinality costraintsInclusion between classesEquivalence between classesInclusion between propertiesEquivalence between properties
OntologiesMethodological knowledge: OWL Constructors
OntologiesOWL: Web Ontology Language
3 versions with different complexity and expressive power
OWL DL
OWL Lite
OWL Full
Poor expressive powerUsed for examples
Based on DLGood expressive powerReasoning capabilitiesWidespread
Very expressive but no reasoning
OntologiesOWL: Web Ontology Language
NaturallyOccurringWaterSourceNaturallyOccurringWaterSource
StreamStream BodyOfWaterBodyOfWater
BrookBrook RiverRiver TributaryTributary LakeLake OceanOcean SeaSea
<Riverrdf:ID=“http://www.china.org/geography/rivers/Yangtze”
xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns=“http://www.geodesy.org/river#”><lenght>6300 kilometers</lenght><emptiesInto rdf:resource=“http://www.china.org/EastChinaSea”/>
</River> Yangtze is a River. A river is a Stream, so this document is relevant to the query
REASONER(inference
engine)“Show me all the documents that contain info about streams”
User query
Ontology
RDF Fact
results
OntologiesProcessing knowledge through reasoning
First Order Logic
Cycl, F-Logic, Loom, KIF, Ontolingua, Shoe, RDFs, OIL, OWL, ...
Trade-off between
Expressive power
Reasoning power
The following statements are not expressible in OWL-DL ...
homeWorker(x) <- work(x,y) ^ live(x,z) ^ loc(y,w) ^loc(z,w)
r(x,z) <- r(x,y) ^ r(y,z)
OntologiesOntology languages
Editor and Browsers
Protégé
SWOOP
Ontotrack
Owl compliant reasoners
Pellet
Fact++
OntologiesOWL tools
More expressive than FOL
Few tools for KIF
Sigma editor and browser
Vampire (Riazanov & Voronkov 2002)
E-prover
...
OntologiesKIF: Knowledge Interchange Format
Ontologies
Who decides? Which features are the determinate ones? Why?
There is no authority: it can be tradition, the law, social consensus, or simply ad hoc purpose-driven.
The point is to know which to adopt and to be careful and consistent.
OntologiesAuthority
Ontologies
Introduction
Mikrokosmos
SUMO
CyC
OntologiesOutline
Ontologies
Representational Issues
The Lexicon
The Ontology
Acquisition Process
Lexicon Acquisition
Guidelines
Ontology/Lexicon Trade-off
Semantics in Action
OntologiesMikrokosmos
Ontologies
Knowledge Base Machine Translation (KBMT)
CRL, NMSU (Viegas et al. 96)
5,000 concepts
Events
Objects
Properties
7,000 Spanish word senses
40,000 word senses
after expansion with productive Lexical Rules
comprar -> comprador, comprable, ...
Text Meaning Representation
MikrokosmosIntroduction
Ontologies
MikrokosmosRepresentational Issues: The Lexicon
Typed Feature Structures (Pollard and Sag 87)
language-dependant
10 zones phonology
orthography
morphology
Syntactic (subcategorization)
Semantic (Lexical Semantic Representation)
syntax-semantic linking
stylistics
paradigmatic
syntacmatic
Ontologies
Adquirir-V1syn: subj: cat: NP
obj: cat: NPsem: acquire
agent: HUMANtheme: OBJECT
Adquirir-V2syn: subj: cat: NP
obj: cat: NPsem: acquire
agent: HUMANtheme: INFORMATION
MikrokosmosRepresentational Issues: The Lexicon
Ontologies
Taxonomic multi-hierarchical
14 local or inherited links in average
language-impartial
EVENTS, OBJECTS, PROPERTIES
Methodology & Guidelines
MikrokosmosRepresentational Issues: The Ontology
Ontologies
ACQUIREDEFINITION “The transfer of possession event where the
agent transfers an object to its possession”IS - A TRANSFER-POSSESSIONSOURCE HUMAN PLACETHEME OBJECT (NOT HUMAN)AGENT ANIMAL (DEFAULT HUMAN)DESTINATION ANIMAL PLACE (DEFAULT HUMAN)
INHERITEDBENEFICIARY HUMAN
MikrokosmosRepresentational Issues: The Ontology
Ontologies
Multi-lingual French, English, Japanese, Russian, Spanish, etc.
Multi-media
Multi-process Analysis
Generation (mono and multilingual)
MT
Summarization
IE
Speech Processing
Tools corpus-search, lookup dictionary, ontology browser
MikrokosmosAcquisition Process: The Lexicon
Ontologies
Guidelines:
1) Do not add instances as concepts Instances do not have their own instances Concepts do not have fixed position in space/time
2) Do not decompose concepts further3) Use close concepts4) Do not add EVENTs with particular arguments5) Do not add concepts with instance-specific aspects,
temporal relations6) Do not add language-specific concepts7) Do not add ontologycal concepts for collections
MikrokosmosAcquisition Process: The Ontology
Ontologies
MikrokosmosAcquisition Process: Ontology/Lexicon Trade-off
Daily negociations
lexicon acquirers ontology acquirers
Possibilities
one-to-one mapping lexicon unspecification lexicon ontology balance
Ontologies
one-to-one mapping
Problems Lexical: every word in a language is a concept conceptual: cuire in french is not ambiguous
PREPARE-FOODPREPARE-FOODINST: COOKING-EQUIPMENTINST: COOKING-EQUIPMENT
COOKCOOKINST: STOVEINST: STOVE
BAKEBAKEINST: OVENINST: OVEN
cook : cuire sur le feucook : cuire sur le feu bake : cuire ou fourbake : cuire ou four
MikrokosmosAcquisition Process: Ontology/Lexicon Trade-off
Ontologies
Lexicon-Ontology Balance
PREPARE-FOODPREPARE-FOODINST: COOKING-EQUIPMENTINST: COOKING-EQUIPMENT
FRYFRYINST: STOVEINST: STOVEINST: FRYING-PANINST: FRYING-PAN
BAKEBAKEINST: OVENINST: OVEN
cook : cuirecook : cuire
bakebake
MikrokosmosAcquisition Process: Ontology/Lexicon Trade-off
Ontologies
El grupo Roche, a través de su compañía en España, adquirió Doctor Andreu.
El grupo Roche adquirió Doctor Andreu a través de su compañía en España.
La adquisición de Doctor Andreu por el grupo Roche fue hecha a través de su compañía en España.
ACQUIRE-1Agent: ORGANIZATION-1Theme: ORGANIZATION-2Instrument: ORGANIZATION-3
ORGANIZATION-1 Object-Name: Grupo RocheORGANIZATION-2 Object-Name: Doctor AndreuORGANIZATION-3 Location: España
MikrokosmosSemantics in Action
Ontologies
Lexicon Unspecification
Problems BAKE is not in the ontology
PREPARE-FOODINST: COOKING-EQUIPMENT
cook : cuire sur le feu bake : cuire ou fourINST: OVEN
MikrokosmosAcquisition Process: Ontology/Lexicon Trade-off
Ontologies
Onto-Search:
Ontological search mechanism to check constraints
check-onto(ACQUIRE, EVENT) = 1since ACQUIRE is a type of EVENT
check-onto(ORGANIZATION, HUMAN) = 0.9since ORGANIZATION HAS-MEMBER HUMAN
MikrokosmosSemantics in Action
Ontologies
1) a-través-de INSTRUMENT, LOCATIONadquirir require PHYSICAL-OBJECT
2) en LOCATION, TEMPORALEspaña is not a TEMPORAL-OBJECT
3) adquirir ACQUIRE, LEARNDoctor Andreu is not an INFORMATION
4) Doctor Andreu ORGANIZATION, HUMANthe Theme of ACQUIRE is not HUMAN
5) compañía CORPORATION, SOCIAL-EVENTORGANIZATIONs typically fill the INSTRUMENT slot of ACQUIRE acts
MikrokosmosSemantics in Action
Ontologies
Text 1 2 3 4 Meanwords 347 385 370 353 364words/sentence 16.5 24.0 26.4 20.8 21.4open-class words 183 167 177 177 176ambiguous words 57 42 57 35 48syntax 21 19 20 12 18correct 51 41 45 34 43% 97 99 93 99 97
MikrokosmosExperiment: WSD
Ontologies
Text Mean Mean Unseenwords 364 390words/sentence 21.4 26open-class words 176 104ambiguous words 48 26syntax 18 9correct 43 23% 97 97
MikrokosmosExperiment: WSD
Ontologies
Introduction Mikrokosmos SUMO CyC
OntologiesOutline
Ontologies
Introduction
Mapping SUMO to WordNet
SIGMA
Vampire & other Theorem provers
OntologiesSUMO
Ontologies
SUMOIntroduction
The Suggested Upper Merged Ontology (SUMO) IEEE Standard Upper Ontology Working Group An upper ontology is limited to concepts that are meta,
generic, abstract, general enough to address a broad range of domain areas.
To promote: Interoperability Information Search and retrieval Automated inference NLP Developement of Domain ontologies
Ontologies
SUMOIntroduction
Incorporates over 50 publicly available sources of high-level ontological content
May be used without fee for any purpose (including for profit)
Refined extensively on the basis of input from SUO mailing list participants
42 publicly released versions (approximately 1,000 concepts, 4000 assertions, and 600 rules so far)
Ontologies
SUMOMapping SUMO to WordNet
Facilitate uses of the SUMO by those who lack extensive training in logic and mathematics
Allows the SUMO to be used automatically by applications that process free text
Completeness check on SUMO content
Testing the SUMO with a state of the art theorem-prover Redundancy Contradiction
Ontologies
SUMOMapping SUMO to WordNet
Align noun, verb and adjective database (96,000 synsets) of WordNet 1.6 to SUMO concepts
synonymousExternalConcept = subsumingExternalConcept + Instance @
00008864 03 n 03 plant 0 flora 0 plant_life 0 . . . | a living organism lacking the power of locomotion &%Plant=
00048640 04 n 01 insider_trading 0 001 @ 00047814 n 0000 | buying or selling corporate stock by a corporate officer or other insider &%FinancialTransaction+
00821498 04 n 01 Actium 0 002 @ 00614512 n 0000 #p 06449758 n 0000 | naval battle where Antony and Cleopatra were defeated by Octavian's fleet under Agrippa in 31 BC &%Battle@
Ontologies
SUMODomain Specific Ontologies
Air force planning Finance and investment Real Estate Terrain features Computers and Networks (Quality of Service) Army planning ECommerce services Ontologies developed outside Teknowledge
Biological viruses Intellectual property Linguistic elements
Ontologies
SUMOExample: Boiling
(subclass Boiling StateChange) (documentation Boiling "The Class of Processes where an Object is
heated and converted from a Liquid to a Gas.") (=>
(instance ?BOIL Boiling) (exists (?HEAT) (and (instance ?HEAT Heating) (subProcess ?HEAT ?BOIL))))
"if instance BOIL Boiling, then there exists HEAT such that instance HEAT Heating and subProcess HEAT BOIL"
Ontologies
SUMOExample: Boiling
(=> (and (instance ?BOIL Boiling) (patient ?BOIL ?OBJ)) (exists (?PART) (and (part ?PART ?OBJ) (holdsDuring (BeginFn (WhenFn ?BOIL)) (attribute ?PART Liquid)) (holdsDuring (EndFn (WhenFn ?BOIL)) (attribute ?PART Gas)))))
"if instance BOIL Boiling and patient BOIL OBJ, then there exists PART such that part PART OBJ and holdsDuring BeginFn WhenFn BOIL attribute PART Liquid and holdsDuring EndFn WhenFn BOIL attribute PART Gas"
Ontologies
(forall (?X) (or (not (instance ?X FloweringPlant)) (not (instance ?X BodyPart)))) YES.
(exists (?X) (and (instance ?X FloweringPlant) (instance ?X BodyPart))) NO.
(forall (?Y) (forall (?X) (=> (equal ?X ?Y) (equal ?X ?Y)))) YES.
(forall (?X) (=> (instance ?X Flower) (instance ?X Organ))) YES.
(instance Ear FloweringPlant) NO.(instance Ear Organ) NO.(subclass Ear Organ) YES.
(forall (?X) (=> (not (subclass ?X Organ)) (subclass ?X Flower) )) NO.
(forall (?X) (=> (subclass ?X Flower) (subclass ?X Organ) )) YES.
SUMOReasoning with Sigma (Vampire)
Ontologies
(forall (?X) (=> (instance ?X Organ) (instance ?X Flower) )) YES!!!
(forall (?X) (=> (subclass ?X Organ) (subclass ?X Flower))) YES!!!
(forall (?X) (=> (not (subclass (?X) Organ)) (subclass (?X) Flower))) YES !!!
(forall (?X) ( or (not (subclass (?X) Organ)) (subclass (?X) Flower) ))) YES!!!
(forall (?X) (or (not (instance ?X Organ)) (instance ?X Flower) ) ) YES !!!
SUMOReasoning with Sigma (Vampire) but ...
Ontologies
Ontologies
German Rigau i [email protected]
IXA groupDepartamento de Lenguajes y Sistemas Informáticos
UPV/EHU