Model-Based Mediation with Domain Maps
Bertram Ludäscher* Amarnath Gupta*
Maryann E. Martone+
*San Diego Supercomputer Center (SDSC)+National Center for Microscopy and Imaging Research (NCMIR)
University of California, San Diego (UCSD)
Overview
• Motivation – Problem with current Mediator Architecture– Complex Scientific Multiple-World Scenarios
• Model-Based Mediation Architecture– Lifting from XML to level of Conceptual Models (CMs)
• Formal Framework– Domain Maps (DMs)– Generic Conceptual Model GCM– Integrated View Definition
• Example Query Evaluation• Open Issues
A Standard Mediator Architecture (MIX -- Mediation of Information using XML, SDSC/UCSD)
MIX MEDIATOR
INTEGRATED VIEW
USER-QueryUSER-Query
Data Sources
DB Files WWW
Lab1 Lab2 Lab3
Wrapper Wrapper Wrapper
XML Q/A
XML Q/A
XML Integrated View DefinitionXMAS/XQuery
XML Q/A
The Problem: Complex Multiple-World Scenarios
• Current Integration Issues– Structural/Schema Conflicts
• common semistructured data model (XML)• schema transformations/integration (XML queries & transforms)
– Limited Query Capabilities• capability based rewriting (e.g., TSIMMIS)
– ... • BUT scenarios are “one-world” (amazon.com vs. bn.com) or
simple multiple world (home buyer)• Problem: No Support for Semantic Mediation
– “complex multiple-world” scenarios (Neuroscience, Geoscience):• complex, disjoint, seemingly unrelated data• “hidden semantics” in complex, indirect relationships
A Neuroscience QuestionWhat is the cerebellar distribution of rat proteins with more than 70%
homology with human NCS-1? Any structure specificity?How about other rodents?
protein localization(NCMIR)
Wrapper
neurotransmission(SENSELAB)
Wrapper
morphometry(SYNAPSE)
Wrapper
??? Integrated View ???
???Mediator ?????? Integrated
View Definition ???
Hidden Semantics: Protein Localization (NCMIR)
<protein_localization><neuron type=“purkinje cell” /><protein channel=“red”><name>RyR</>….</protein><region h_grid_pos=“1” v_grid_pos=“A”><density> <structure fraction=“0.8”>
<name>spine</><amount name=“RyR”>0</>
</> <structure fraction=“0.2”>
<name>branchlet</><amount name=“RyR”>30</>
</>
Molecular layer ofCerebellar Cortex
Purkinje Cell layer ofCerebellar Cortex
Fragment of dendrite
Hidden Semantics: Morphometry (SNYAPSE)
<neuron name=“purkinje cell”><branch level=“10”>
<shaft>…
</shaft> <spine number=“1”><attachment x=“5.3” y=“-3.2” z=“8.7” /> <length>12.348</> <min_section>1.93</> <max_section>4.47</> <surface_area>9.884</> <volume>7.930</> <head> <width>4.47</>
<length>1.79</> </head>
</spine> …
Branch level beyond 4 is a branchlet
Must be dendritic because Purkinje cellsdon’t have somatic spines
Approach: Model-Based Mediation
• Complex Multiple Worlds Integration Problem– terms not directly joinable– complex, indirect associations– unstated, “hidden” semantics (not just schema conflicts)
• Missing “Semantic Link”=> how to define complex, indirect semantic links?
=> lift mediation to the level of conceptual models (CMs)=> domain expert’s knowledge formalized as rules over CMs=> Model-Based Mediation
XML-Based vs. Model-Based Mediation
IF THEN IF THEN IF THEN
LogicalDomainConstraints
Integrated-CM :=
CM-QL(Src1-CM,...)
. . ....
....
........ (XML)Objects
Conceptual Models
C2 C3
C1
R
Classes,Relations,is-a, has-a, ...
DOMAIN MAP
Raw DataRaw DataRaw Data
XMLElements
XML Models
Integrated-DTD :=
XQuery(Src1-DTD,...)
No DomainConstraints
A = (B*|C),DB = ...
Structural Constraints (DTDs),Parent, Child, Sibling, ...
Extended Mediator Architecture• Wrappers export Conceptual Models (CMs)
– facts & rules for classes, relationships, ICs, ... – source data is “put into context” (“aboutness” index) by linking
to domain maps (DMs)• Mediator employs CMs and DMs
– ... to define complex semantic relationships on the formalized domain knowledge
• Generic Conceptual Model (GCM)– as a common target CM – minimal requirements/core expressions:
• instance(O,C), subclass(C1,C2)• method_type(C,M,C’), method_value(O,M,R)• relation_type(R,A1/C1,...,An/Cn)• relation_value(R,a1,...,an)
• Expressiveness, Extensibility – allow inductive properties (inheritance, closures, ...)– employ a declarative rule language (e.g. F-Logic)
Model-Based Mediator Architecture
USER/ClientUSER/Client
S1 S2
S3
XML-Wrapper
CM-WrapperXML-Wrapper
CM-WrapperXML-Wrapper
CM-Wrapper
GCMCM S1
GCMCM S2
GCMCM S3
CM (Integrated View)
MediatorEngine
FL rule proc.
LP rule proc.
Graph proc.XSB Engine
Domain MapDM
Integrated View Definition IVD
Logic API(capabilities)
CM Queries & Results (exchanged in XML)
CM Plug-In
Formalizing Domain Knowledge:Domain Map for SYNAPSE and NCMIR
A domain map comprises• Description Logic facts ...
- concepts ("classes") - roles ("associations")
• derived properties ...• ... expressed as logic rules
- (e.g. F-logic)
domain map
Purkinje cells and Pyramidal cells have dendritesthat have higher-order branches that contain spines.Dendritic spines are ion (calcium) regulating components.Spines have ion binding proteins. Neurotransmissioninvolves ionic activity (release). Ion-binding proteinscontrol ion activity (propagation) in a cell. Ion-regulatingcomponents of cells affect ionic activity (release).
domain expert knowledge
equivalent Description Logic facts
Domain Map Refinement
In addition to registering (“hanging off”) data, a source may also refine the mediator’s domain map...
... source can register new concepts at the
mediator ...
Definition of Integrated Views (Deja Vu?) ...• XML/CM-2-FL Translators
<!ELEMENT Studies (Study)*><!ELEMENT Study (study_id, … animal, experiments, experimenters><!ELEMENT experiments (experiment)*><!ELEMENT experiment (description, instrument, parameters)>
studyDB[studies =>> study].study[study_id => string; … animal => animal; experiments =>> experiment; experimenters =>> string].…
• Specification of Domain Knowledge• Subclasses
• Data Classification
• Integrity Constraints
mushroom_spine :: spine
DERIVE S:mushroom_spine FROM S:spine[head_; neck _].
ic1(S):ALERT[type “invalid spine”; object S] IF S:spine[undef ->> {head, neck}].
... Definition of Integrated Views (Multiple Sources)
• Integrated View Definition
• Schema Reasoning & Dynamic Classes
taxon[subspecies string; species string; genus string; … phylum string; kingdom string; superkingdom string].
subspecies::species::genus:: … kingdom::superkingdomTAXON Rank Hierarchy
DERIVE T:TR, TR::TR1 FROMT: ‘TAXON’.taxon[Taxon_Rank TR, Taxon_Rank1 TR1],Taxon_Rank::Taxon_Rank1.
Create Classes fromTAXON data
DERIVEprotein_distribution(Protein, Organism,Brain_region,Feature_name,Anatom,Value) FROM I:protein_label_image[ proteins ->> {Protein}; organism -> Organism; anatomical_structures ->>
{AS:anatomical_structure[name->Anatom]}] , % from PROLAB AS..segments..features[name->Feature_name; value->Value],NAE:neuro_anatomic_entity[name-> Anatom; % from ANATOM located_in->>{Brain_region}].
TAXON DB Schema
Query Evaluation Example
push selection
@SENSELAB: X1 := select output from parallel fiber ;determine source context
@MEDIATOR: X2 := “hang off” X1 from Domain Map;compute region of interest (here: downward closure)
@MEDIATOR: X3 := subregion-closure(X2);push selection
@NCMIR: X4 := select PROT-data(X3, Ryanodine Receptors);compute protein distribution
@MEDIATOR: X5 := compute aggregate(X4);
"How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?"
Deductive Closure of “has_a” with “tc(is_a)”:(YES -- Real Recursive Views!! ;-) ANATOM CLOSURE
Interactive Queries KIND01
Computed Protein Localization Data PROTLOC
Client-Side Result Visualization(using AxioMap Viewer: Ilya Zaslavsky) PROTLOC-AxioMap
Comparison & Summary: Model-Based Mediation
(Complex) Single World/ Simple Multiple World
Complex Multiple World
Integration target global schema(common / shared)
1..n shared domain maps
Example scenario suppliers’ catalogs/ home buyer
complex scientific data (neuroscience, geoscience,…)
Schema level overlapInstance level overlap
large / smalllarge / none
none … smallnone
Source correlation direct, instance / schema level indirect, conceptual (knowledge)level
Techniques schema transformations, schemaintegration
“structural” integration
domain maps, formalized domainknowledge (“semantic bridges”)=> model-based (“semantic”)
mediationIntegration languagesExpressiveness
relational, semistructured,queries & transformations
(e.g., SQL, XQuery, XSLT)
conceptual (description logics),object-oriented, deductive features
(e.g., GCM, F-logic)Integrators DB expert domain expert + KRDB expert
Conclusions and Outlook
• Model-based Mediation Architecture– for complex multiple worlds scenarios (Neuroscience, ...)– sources export CMs (data “lifted” to conceptual level)– mediator employs DMs (“semantic road map”)
• Simple Prototype based on XSB/FLORA– source and result data situated in DM context– domain scientists are excited ...
• Some Open Issues – striking the right balance between complexity and expressiveness of DMs
(e.g. subsumption and satisfiability of DMs should be decidable)– query processing/optimization– modeling query capabilities– semantic annotation tools for “dumb” sources– re-implement ... *sigh* ...– ...
ADDITIONAL MATERIAL STARTS HERE
ANATOM Domain Map ANATOM
Model-Based Mediation with DOMAIN MAPS (DMs)
Integrated-CM(Z1,...) := get X1,... from Src1;
get X2,... from Src2;LINK (Xi, Yj);Zj = CM-QL(X1,...,Y1,...)
LINK(X,Y):X.zip = Y.zip
X.addr in Y.zipX.zip overlaps Y.county...
• “Semantic Road Maps” for situating source data
=> navigational aid (browsing source classes at the conceptual level)
=> basis for integrated views across multiple worlds
=> link points (concepts) and labeled arcs (roles)
=> formal semantics (in FL and/or DLs)
Example: ANATOM DM
= antatomical entities (concepts) + is_a, has_a, overlaps, ... (roles)
=> from syntactic equality to semantic joins
Example Query Evaluation (I)
• Example: protein_distribution– given: organism, protein, brain_region– ANATOM DM:
• recursively traverse the has_a_star paths under brain_region collect all anatomical_entities
– Source PROLAB:• join with anatomical structures and collect the value of attribute
“image.segments.features.feature.protein_amount” where “image.segments.features.feature.protein_name” = protein and “study_db.study.animal.name” = organism
– Mediator:• aggregate over all parents up to brain_region• report distribution
Interactive Queries KIND
Summary & Outlook: Federation of Brain Data
CCB, Montana SUSurface atlas, Van Essen
Lab
NCMIR, UCSDstereotaxic atlas LONIMCell, CNL, Salk
ANATOM
PROTLOC
Result (VML)
Result (XML/XSLT)
MODEL-BASED Mediation