CSE-291: Ontologies in Data Integration
Department of Computer Science & Engineering Department of Computer Science & Engineering University of California, San DiegoUniversity of California, San Diego
CSE-291: Ontologies in Data IntegrationCSE-291: Ontologies in Data IntegrationSpring 2003Spring 2003
Ontologies in ActionOntologies in Action
Amarnath GuptaAmarnath [email protected]@SDSC.EDU
CSE-291: Ontologies in Data Integration
OverviewOverview
• Information IntegrationInformation Integration– Querying with Ontologies
– Registering Into Ontologies
• Ontologies of ProcessesOntologies of Processes– An Application Scenario
– A Disease Map
• A look at a theoryA look at a theory
CSE-291: Ontologies in Data Integration
Ontologies in Information IntegrationOntologies in Information Integration
• Why is information integration with Ontologies different Why is information integration with Ontologies different from “regular” information integration?from “regular” information integration?
• Regular Information IntegrationRegular Information Integration– Assume relational sources S1, S2
– S1 exports relation R1(patientID, brain_region, brain_vol)
– S2 exports relation R2(species, brain_region, protein, density)
– Define an “integrated view”• V1(B, V, P, D) if
R1(_, B, V) R2(“human”, B, P, D)
– A Query against the view• Ans(Brain_region, Protein) if
V1(Brain_region,V,Protein,D) D > 0 V < 0.25
CSE-291: Ontologies in Data Integration
Information Integration under GAVInformation Integration under GAV• Ans(Brain_region, Protein) Ans(Brain_region, Protein)
if V1(Brain_region,V,Protein,D) D > 0 V < 0.25
• Ans(Brain_region, Protein) Ans(Brain_region, Protein) if R1(_, Brain_region, V) R2(“human”, Brain_region, Protein, D) D > 0
V < 0.25
• Ans(Brain_region, Protein) Ans(Brain_region, Protein) if R1(_, Brain_region1, V) R2(“human”, Brain_region2, Protein, D) Brain_region1= Brain_region2 D > 0 V < 0.25
• Ans(Brain_region, Protein) Ans(Brain_region, Protein) if R1(_, Brain_region, V) V < 0.25 R2(“human”, Brain_region, Protein, D) D > 0 Brain_region1= Brain_region2
• Ans(Brain_region, Protein) Ans(Brain_region, Protein) if R1(_, Brain_region1, V) V < 0.25 @S1 R2(“human”, Brain_region2, Protein, D) D > 0 @S2 Brain_region1= Brain_region2 @mediator
CSE-291: Ontologies in Data Integration
cerebellum
brain
cerebellar peduncle
fiber bundleaxon
neuron
compartment
dendritecell body
brain stem
vermis
cortex
folia
Purkinje cell
granule cell
medullary center
flocconodular lobe
corpus cerebelli
flocculusposteolateral fissure
primary fissure
l. cerebellarhemisphere
paravermealzone
anterior lobe posterior lobe
deep cerebellarnuclei
molecularlayer
Purkinjecell layer
granularlayer r. cerebellar
hemisphere
dentate nucleus
inf. olive nucleus
globosenucleus
interposednucleus
fastigial nucleus
Sup. CP Mid. CPInf. CP
receives_afferent_from
attaches(cp,cerebellum,bstem)
CSE-291: Ontologies in Data Integration
Effect of an Ontology in GAV IntegrationEffect of an Ontology in GAV Integration
• Ontologies provideOntologies provide– relations (subclass, part-of…) over terms and axioms about relations
• Part-of can be of different kinds– member-collection (axons are part of a fiber bundle)– component-object (compartments like axon are components of a neuron)– portion-mass (myelin-sheath around axons constitute white matter of the brain)– stuff-object (cytosol is the constituent part of cytoplasm)– phase-activity (metastasis is a phase of cancer)– place-area (Manhattan is a place in New York)– feature-event
• For each flavor of part-of there is a transitive relation part-of-tr within itself but not necessarily with respect to each other
– Arm is a part of a musician, and a musician is part of an orchestra BUT an arm is *not* part of an orchestra!!
– constraints in the form logic statements• Intensional (derived) relations:
– inside(a,b) if part_of(mc)(a,b) part_of(co)(a,b) part_of(pm)(a,b) spatially_in(a,b)
• Integrity constraints– The protein “neuN” is not expressed in Purkinje cells
CSE-291: Ontologies in Data Integration
Effect of an Ontology in GAV IntegrationEffect of an Ontology in GAV Integration
• Consider the same caseConsider the same case– S1 exports relation R1(patientID, brain_region, brain_vol)– S2 exports relation R2(species, brain_region, protein, density)– Ontology source Ont exports all relations and constraints shown before– Define an “integrated view”
• V1(B, V, P, D) ifR1(_, B1, V) R2(“human”, B2, P, D) part-of-tr(B2,B1)
– A Query against the view• Ans(Brain_region, Protein) if
V1(Brain_region,V,Protein,D) D > 0 V < 0.25
• Ans(Brain_region, Protein) Ans(Brain_region, Protein) if R1(_, Brain_region1, V) V < 0.25 @S1 R2(“human”, Brain_region2, Protein, D) D > 0 @S2 part-of-tr(Brain_region1,Brain_region2) @Ont
– Issues• The possibility of having recursive queries and having recursive views• Smart use of constraints in query evaluation
CSE-291: Ontologies in Data Integration
Using Constraints in Query EvaluationUsing Constraints in Query Evaluation
• Techniques from Semantic Query OptimizationTechniques from Semantic Query Optimization• V1(B1, B2, P, D) if
R1(_, B1, V) R2(“human”, B2, P, D) part-of-tr(B2,B1)
• Ans(Brain_region, Density) if
V1(“cerebellum”, Brain_region, “neuN”, Density)
IC1: Density = 0 if R2(“human”, “Purkinje Cell”, “NeuN”, Density)
IC2: Density2 = 0 if R2(S,B1,P,0) R2(S,B2,P,Density2) part-of-tr(B2,B1)
– Modify the query• Ans(Brain_region, Density) if
V1(“cerebellum”, Brain_region, “neuN”, Density) not(Brain_region=“Purkinje_cell”) Residue(derived predicate)
How would you compute a residue?How complex/feasible is this computation?
How would you compute a residue?How complex/feasible is this computation?
–But more importantly• How do you control evaluation of a recursive predicate in Ont by supplying integrity constraints from the mediator or a data source?
»By invoking general recursion control mechanisms?– OPEN RESEARCH PROBLEM
CSE-291: Ontologies in Data Integration
The Registration ProblemThe Registration Problem
• Suppose a semantic mediator system already exists with Suppose a semantic mediator system already exists with nn sourcessources
• A new source SA new source Snn+1+1 wants to join the mediator such that wants to join the mediator such that– The mediator can simply “read in” the source’s model without any
disruptions
– All existing integrated views can make “best effort” use of the new source seamlessly
• Problems: Problems: – What does the source need to declare itself to mediator?
– How does the mediator use this information to assimilate the new source?
CSE-291: Ontologies in Data Integration
Source DescriptionSource Description
• Conceptual ModelConceptual Model– Local Ontology (ONT) – the terminological vocabulary used
by the schema• Properties of relationships in the ontology
– Object Model (OM) – the export schema • Ontological Grounding (ONTG) – relationship between export schema
and local ontology
– Contextualization (CON) – relationship of OM and ONT with mediator’s knowledge base ONT(M)
• CSLCSL: a language to express : a language to express CONCON formulae formulae
CSE-291: Ontologies in Data Integration
An ExampleAn Example
IDID XSizeXSize YSizeYSize StructStruct DepositDeposit
IDID NameName AreaArea VolumeVolume BBoxBBox
Structure
surrounds(Structure.ID, Structure.ID)
deposit_loc(Deposit.ID) Structure.ID
IDID NameName TypeType IntensityIntensity BBoxBBox
Deposit
cell
cytoplasmnucleus
mitochondrion cytosol endosome
membrane
innermembrane
Image
matrix
dom(Image.Struct) in tc_has(cytoplasm)Structure.Name stores Protein
tc_has(X) = trans_closure(has(X))
has
Objects
Associations
Functions
Local Ontology
Property of Local Ontology
Ontological Grounding
substancestores
has has
hashas
has
has
CSE-291: Ontologies in Data Integration
Roles of Ontological GroundingRoles of Ontological Grounding
• Semantic Constraints on Attribute DomainsSemantic Constraints on Attribute Domains– Image.Struct has to be below Cytoplasm
• Refinement of local OntologyRefinement of local Ontology– Cytoplasm stores substances, but instances of the exported
object called Structure stores only proteins
• Intensional DefinitionsIntensional Definitions– DENATURED PROTEIN(ProtName) IF DEPOSIT(ID, ProtName, protein,
dark, _), deposit in structure(ID) NULL;
CSE-291: Ontologies in Data Integration
ContextualizationContextualization
• WHAT: Local schema elements are expressed as views WHAT: Local schema elements are expressed as views over mediator’s ontology over mediator’s ontology – Recall: integrated views are still defined in a global as view
fashion
• WHY: The LAV technique allows sources to join while WHY: The LAV technique allows sources to join while queries against GAV views do not need us to do an queries against GAV views do not need us to do an inverse rule mapping inverse rule mapping
CSE-291: Ontologies in Data Integration
Context Specification LanguageContext Specification Language
• Types of local schema elementsTypes of local schema elements– From Object Model: classes(S), attributes(S), associations(S), instances(S)
– From Local Ontology: concepts(S), relationships(S)
– From Both: constraints(S)
• Types of mediator’s schema elementsTypes of mediator’s schema elements– concepts(M), relationships(M), constraints(M)
• Context specificationContext specification
map map ((correspondence relationcorrespondence relation)()(XX11,…, X,…, Xnn) ) IF IF type declarationstype declarations, ,
bodybody– Correspondence relation: the name of the mapping
– X1 …Xn : the S elements and the M elements
– Type declarations: types of the S and M elements
– Body: the actual mapping definition
CSE-291: Ontologies in Data Integration
Context Specification LanguageContext Specification Language
• map map (subconcept)(cytoplasm, cell_compartment) (subconcept)(cytoplasm, cell_compartment) IF IF cytoplasm:concepts(CCDB), cell_compartment: cytoplasm:concepts(CCDB), cell_compartment: concepts(mediator)concepts(mediator)– Relates a concept of the local ontology (cytoplasm) to that of the
mediator’s ontology(cell_compartment) – cytoplasm is a cell compartment
• Consider a query at the mediatorConsider a query at the mediator– “Which cell_compartments have associated images?”– The mapping will enable the mediator to ask the CCDB source “Which
‘isa descendants’ of ‘cytoplasm’ have associated images?”– Using ontological grounding the source can translate this to a query
against the Image class
CSE-291: Ontologies in Data Integration
Some Example CasesSome Example Cases
• map map (concept-concept)(regulates( nejire,CREB )) (concept-concept)(regulates( nejire,CREB )) IF IF nejire:concepts(mediator), CREB:concepts(CCDB) nejire:concepts(mediator), CREB:concepts(CCDB) – The mapping instantiates a relation (regulates) between the mediator’s
concept nejire and CCDB’s concept CREB
– Query enabled: “Find images with deposits of nejire-regulated proteins”
• map map (concept concept)(tc_regulates(nejire, CREB)) (concept concept)(tc_regulates(nejire, CREB)) IF IF nejire:concepts(mediator), CREB:concepts(CCDB)nejire:concepts(mediator), CREB:concepts(CCDB)– Query enabled: “Find images with deposits of proteins that are indirectly
regulated by nejire”
– The query will traverse the “regulates” edges in the mediator and the source to find all paths between nejire in the mediator and CREB in CCDB. The concepts in the path will then be used to answer the query.
CSE-291: Ontologies in Data Integration
Some Example CasesSome Example Cases
• Relating edgesRelating edgesmap (assoc-rel)(surrounds(s1 s2), inverse( inside(s2,s1)) IF
surrounds(s1; s2):assoc(CCDB), inside(s2,s1):relationships(mediator), not has_part(s1,s2)
– The mediator’s ontology has a relationship “inside” and the source’s object model has an association called “surrounds”
– They are almost inverses of each other• A surrounds B B inside A unless B part_of A
– This brings out the conceptual difference between the source’s semantics of a relationship and the mediator’s semantics of the same
– The mapping will force the mediator to test the has_part condition before pushing a (rewritten) query to the CCDB source
CSE-291: Ontologies in Data Integration
Registration at MediatorRegistration at Mediator
• The source sends the mediator its conceptual model The source sends the mediator its conceptual model including the including the CSLCSL mappings mappings
• The mediatorThe mediator– Stores the description in a global registry
– Updates ONT(M) with new relationships or rules about the relationships, duly tagged by the source name
– Translates ontological groundings to executable rules• domain(STRUCTURE.volume) in [0,300] becomes
false :– X:structure[volumeV], not (0 < V < 300)
CSE-291: Ontologies in Data Integration
Registration at MediatorRegistration at Mediator
– Translates each CSL statement to two rulesmap (subrelation)(has(co); has part) IF
has(co):relationships(CCDB), has_part:relationships(mediator)
translates to:
has part(X,Y) :– CCDB.has(co)(X,Y) (derive)
false :– CCDB.has(co)(X,Y), not has_part(X,Y) (denial)
• The first rule is an IDB for has_part
• The second rule is an integrity constraint
CSE-291: Ontologies in Data Integration
Ontologies of ProcessesOntologies of Processes• What is a Process?What is a Process?
– From Merriam-Webster2 a (1) : a natural phenomenon marked by gradual changes that lead toward a particular result <the process of growth> (2) : a natural continuing activity or function <such life processes as breathing> b : a series of actions or operations conducing to an end; especially : a continuous operation or treatment
• Revisiting the Central Theme of Formal OntologyRevisiting the Central Theme of Formal Ontology– Given a logical language L ...
• ... a conceptualization is a set of models of L which describes the admittable (intended) interpretations of its non-logical symbols (the vocabulary)
• ... an ontology is a (possibly incomplete) axiomatization of a conceptualization.
– Theory of formal distinctions among things and relations– Basic tools
• Theory of parthood• Theory of integrity• Theory of identity• Theory of dependence
CSE-291: Ontologies in Data Integration
Disease Maps: “Designing” an OntologyDisease Maps: “Designing” an Ontology
• On-going work (Gupta, Ludäscher, Martone, Grethe)On-going work (Gupta, Ludäscher, Martone, Grethe)– Goal: to characterize the processes, manifestations and outcomes of a
specific disease (or family of diseases)– A node and edge labeled multigraph where logical formulae can be
constructed over subset of edge labels to describe• Transitive relations• Temporal relations• Causal relations• …
– Views• A subgraph that reflect the viewpoint of a specific discipline
– Elaborations and Abstractions• A “zoom in” ability where a subgraph may be the detail of another smaller
subgraph
– Query Support• Path and subgraph extraction, closure computation, graph aggregates,
homomorphic graph matching, consequence derivation
Can such an ontology be constructed with one formalism?How do you combine different formalismsand still obtain the right conclusions?
Can such an ontology be constructed with one formalism?How do you combine different formalismsand still obtain the right conclusions?
CSE-291: Ontologies in Data Integration
Apoptosis (Suicide of a Cell)Apoptosis (Suicide of a Cell)
•Processes have phases (temporal part-of)Processes have phases (temporal part-of)–Every process P goes through the phases
•initiate-progress-terminate
–Every phase can be progressively divided into finer sub-phases
•Processes have phases (temporal part-of)Processes have phases (temporal part-of)–Every process P goes through the phases
•initiate-progress-terminate
–Every phase can be progressively divided into finer sub-phases
ApoptosisApoptosis
Receipt of Receipt of Death SignalDeath Signal
DegenerationDegeneration DisintegrationDisintegration
shrink mitochondria break down release of cytochrome c bleb development on surface degradation of chromatin in nucleus
TriggeringTriggeringEventEvent
CSE-291: Ontologies in Data Integration
An Intuitive Attempt to FormalizeAn Intuitive Attempt to Formalize
• Let Let SS00 be an be an initial situationinitial situation
• Let Let occurs occurs be a distinguished binary function symbolbe a distinguished binary function symbol– occurs(, s) denotes a successor situation to situation s resulting
from event – events may be parameterized
• degrades(chromatin, nucleus) may mean that chromatin degrades in the nucleus
• occurs(degrades(chromatin, nucleus), s) demotes the resultant situation occurring due to degradation of chromatin when the current situation is s
• occurs(degrades(chromatin, nucleus, occurs(bleb_development, occurs(release(cytochrome_c), S1)))
refers to the sequence of events
[release(cytochrome_c), bleb_development, degrades(chromatin, nucleus)]
CSE-291: Ontologies in Data Integration
A Step Back: Second Order LogicA Step Back: Second Order Logic
• First order logic permitsFirst order logic permits– quantification over individuals
• Second order logic permitsSecond order logic permits– quantification over predicates and functions
• Thus a second order logic hasThus a second order logic has– Predicate variables – Xn
1 for infinitely many n-place predicates– Function variables – Fn
1 for infinitely many n-place functions
• Second order logic is incomplete!!Second order logic is incomplete!!– It is not possible to have an axiomatization and rules of
inference that can recursively enumerate all and only the valid second-order sentences
– However, second order theories and their special cases are useful for developing the ontological basis for processes
CSE-291: Ontologies in Data Integration
Situation Calculus [McCarthy, Reiter, Situation Calculus [McCarthy, Reiter, Levesque](adapted for our purpose)Levesque](adapted for our purpose)
• LLsitcalcsitcalc is a second order language with equality is a second order language with equality– Sorts: events, situations, objects
– Logical Symbols and Quantifiers: , , , \forall, \exists
– Function Symbols of sort situation:• Constant symbol S0, called initial situation
• Binary function occurs: event situation situation
– Binary predicate symbol \sqsubset: situation situation – Defines an ordering relation (temporal part-of) on situations
– Binary predicate symbol poss: event situation – poss(a, s) means it is possible for event a to occur in situation s
– Countably infinitely many symbols for• n-ary predicates (event object)n
• Functions (event object)n object and (event object)n event
CSE-291: Ontologies in Data Integration
Situation Calculus(adapted)Situation Calculus(adapted)
• Relational FluentsRelational Fluents– Infinitely many predicate symbols of sort
(event object)n situation – They are situation-dependent relations, i.e., predicates with
situation-dependent truth value– binds-to(FasL, cell-surface) is a relationship between FasL and
cell-surface, but it is not always true– binds-to(FasL, cell-surface, occurs(bound(toxic-T-cell,
target),s))
• Functional FluentsFunctional Fluents– Infinitely many function symbols of sort
(event object)n situation event object– Since chromatin-content(cell) varies with the state of apoptosis
Represent it as: chromatin-content(cell, s)
situation term
CSE-291: Ontologies in Data Integration
Examples of FluentsExamples of Fluents
initially: location(MPP+) = synaptic_cleft
occurs(uptake_by(DAT)): location(MPP +) = bound_to(DAT)
occurs(release_by(DAT)): location(MPP+) = inside(neuron)
occurs(transport_to(DAT,nucleus)): location(MPP +) = inside(mitochondria)
Neurotoxin ‘MPTP’ is converted to ‘MPP+’ by ‘MAOB’ in the synaptic cleft. The active form ‘MPP+’ is picked up by the dopamine transporter, and released inside the neuron, where it accumulates in mitochondria. This leads to complex I (an antioxidant) inhibition, which leads to free radical generation.
Neurotoxin ‘MPTP’ is converted to ‘MPP+’ by ‘MAOB’ in the synaptic cleft. The active form ‘MPP+’ is picked up by the dopamine transporter, and released inside the neuron, where it accumulates in mitochondria. This leads to complex I (an antioxidant) inhibition, which leads to free radical generation.
Relation Relation contentcontent(Organelle, Substance, Concentration)(Organelle, Substance, Concentration)
initially: content(mitochondria, MPP+, 0) occurs(transport_to(nucleus)): content(mitochondria, MPP+, inc(0)) occurs(transport_to(nucleus)): content(mitochondria, MPP+,
inc(inc(0)))
CSE-291: Ontologies in Data Integration
The Frame ProblemThe Frame Problem
• Events haveEvents have– Preconditions
• Poss(breakdown(mitochondria), s) releases(Bcl2, Apaf1, s) leaks(cytochrome-c, mitochondria,s)
– Effect Axioms• An effect axiom states how an event affects the value of a fluent
– membrane(x,cell) ion(y) permeable(x,y) enters(y, cell, occurs(high-conc(y,outside(x)),s))
• Fluents haveFluents have– Frame Axioms
• A frame axiom specifies the event invariants (fluents that are not affected by an event) of a domain
• Positive frame axiom– content(mitochondria, y, V, s) content(mitochondria, y, V, occurs(enters(y, cell, s)))
• Negative frame axiom high-conc(x, cell,s) [xy] high-conc(x, cell, occurs(high-
conc(y,outside(cell)),s))
If there are E events and S situations, 2 E S frame axioms may be needed!!
If there are E events and S situations, 2 E S frame axioms may be needed!!
CSE-291: Ontologies in Data Integration
Toward a ConclusionToward a Conclusion
• Solutions for the Frame ProblemSolutions for the Frame Problem– Causal Completeness Assumption
• We know all preconditions under which an event causes a fluent to change values to a successor state
– Explanation Closure Assumption• We know all events that may cause a fluent to change its value
– Unique Name Assumption• Identical events have identical attributes
– Then, the number of axioms can be reduced to the order of E+F provided
• conditional, iterative, recursive and nondeterministic events do not occur
• For a multi-theory Ontology like a disease mapFor a multi-theory Ontology like a disease map– We need much more than a description logic and a situation
calculus
CSE-291: Ontologies in Data Integration
ReferencesReferences
1. D. Leviant, “Higher Order Logic” In D.M. Gabbay, C.J. Hogger and J.A. Robinson (eds.), Handbook of Logic in Artif. Inell. And Logic Programming, pp. 229-321, Clarendon Press, Oxford, 1994.
2. A. Gupta, B. Ludäscher, M. E. Martone, “Registering Scientific Information Sources for Semantic Mediation”, 21st International Conference on Conceptual Modeling, (ER), Tampere, Finland, pp. 182-198, October 2002.
3. J. McCarthy, Situations, actions and causal laws. Tech. Report, Stanford Univ., 1968.
4. R. Reiter, Knowledge in Action, The MIT Press, Cambridge, MA, 2001.
5. P. Godfrey, J. Grant, J. Gryz, and J. Minker, “Integrity constraints: Semantics and applications” In Jan Chomicki and Gunter Saake (eds.), Logics for Databases and Information Systems. Kluwer, 1998.
6. U. Chakravarthy, J. Grant, and J. Minker, “Logic-based approach to semantic query optimization”, ACM Transactions on Database Systems, 15(2), pp. 162-207, 1990.
7. R. Kolwaski and M. Sergot, “A logic-based calculus of events”, New Generation Computing, 4, pp. 67-95, 1986.