Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | brice-woods |
View: | 216 times |
Download: | 2 times |
From Data Integration To Semantic From Data Integration To Semantic Mediation:Mediation:
Addressing Heterogeneities in DataAddressing Heterogeneities in Data
From Data Integration To Semantic From Data Integration To Semantic Mediation:Mediation:
Addressing Heterogeneities in DataAddressing Heterogeneities in Data
Bertram LudBertram Ludää[email protected]
Knowledge-Based Information Systems Lab
San Diego Supercomputer Center
and
Department of Computer Science & Engineering
University of California, San Diego
Bertram LudBertram Ludää[email protected]
Knowledge-Based Information Systems Lab
San Diego Supercomputer Center
and
Department of Computer Science & Engineering
University of California, San Diego
2
OutlineOutline
1.1. Information Integration from a Database PerspectiveInformation Integration from a Database Perspective
2.2. XML-Based Data Integration XML-Based Data Integration
3.3. Model-Based / Semantic MediationModel-Based / Semantic Mediation
4.4. DiscussionDiscussion
An Online Shopper’s Information Integration ProblemAn Online Shopper’s Information Integration Problem
El Cheapo: “Where can I get the cheapest copy (including shipping cost) of Wittgenstein’s Tractatus Logicus-Philosophicus within a week?”
?Information Integration
?Information Integration
addall.comaddall.com
“One-World” Scenario:XML-based mediator
“One-World” Scenario:XML-based mediator
amazon.comamazon.com A1books.comA1books.comhalf.comhalf.combarnes&noble.combarnes&noble.com
Mediator (virtual DB)(vs. Datawarehouse)
Mediator (virtual DB)(vs. Datawarehouse)
A Home Buyer’s Information Integration ProblemA Home Buyer’s Information Integration Problem
Which houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood
with below-average crime rate and diverse population?
?Information Integration
?Information Integration
RealtorRealtor DemographicsDemographicsSchool RankingsSchool RankingsCrime StatsCrime Stats
“Multiple-Worlds” Scenario:
XML-based mediator
“Multiple-Worlds” Scenario:
XML-based mediator
A Neuroscientist’s Information Integration ProblemA Neuroscientist’s Information Integration Problem
What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity?
How about other rodents?
?Information Integration
?Information Integration
protein localization(NCMIR)
protein localization(NCMIR)
neurotransmission(SENSELAB)
neurotransmission(SENSELAB)
sequence info(CaPROT)
sequence info(CaPROT) morphometry
(SYNAPSE)
morphometry(SYNAPSE)
“Complex Multiple-Worlds” Scenario:
Model-based mediator
“Complex Multiple-Worlds” Scenario:
Model-based mediator
A Geoscientist’s Information Integration ProblemA Geoscientist’s Information Integration Problem
What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ?
How does it relate to host rock structures?
?Information Integration
?Information Integration
Geologic Map(Virginia)
Geologic Map(Virginia) GeoChemicalGeoChemical GeoPhysical
(gravity contours)
GeoPhysical(gravity contours)
GeoChronologic(Concordia)
GeoChronologic(Concordia)
Foliation Map(structure DB)
Foliation Map(structure DB)
“Complex Multiple-Worlds” Scenario:
Model-based mediator
“Complex Multiple-Worlds” Scenario:
Model-based mediator
7
Information Integration Challenges: Information Integration Challenges: Heterogeneities = SHeterogeneities = S44......
• SSystem Aspectsystem Aspects– platforms, devices, distribution, APIs, protocols, … platforms, devices, distribution, APIs, protocols, …
• SSyntaxesyntaxes– heterogeneousheterogeneous data formatsdata formats ( (one for each tool one for each tool ...)...)
• SStructurestructures– heterogeneous schemas heterogeneous schemas ((one for each DBone for each DB ...) ...)
– heterogeneousheterogeneous data modelsdata models ( (RDBs, ORDBs, OODBs, XMLDBs, RDBs, ORDBs, OODBs, XMLDBs, flat files, …flat files, …) )
• SSemanticsemantics– unclear & “hidden” semantics unclear & “hidden” semantics : e.g., incoherent terminology, : e.g., incoherent terminology,
multiple / informal taxonomies, implicit assumptions, ...multiple / informal taxonomies, implicit assumptions, ...
8
Information Integration Challenges Information Integration Challenges
• System aspects: “Grid” middlewareSystem aspects: “Grid” middleware– distributed data & computingdistributed data & computing– Web services, WSDL/SOAP, …Web services, WSDL/SOAP, …– sourcessources = functions, files, databases, … = functions, files, databases, …
• Syntax & Structure: Syntax & Structure: (XML-Based) Mediators(XML-Based) Mediators
– wrapping, restructuring wrapping, restructuring – (XML) queries and views(XML) queries and views– sourcessources = (XML) databases = (XML) databases
• Semantics: Semantics: Model-Based/Semantic MediatorsModel-Based/Semantic Mediators
– conceptual modelsconceptual models and declarative views and declarative views – Semantic Web: ontologies, description Semantic Web: ontologies, description
logics, RDF(S), DAML+OIL, OWL, ...logics, RDF(S), DAML+OIL, OWL, ...– sourcessources = knowledge bases (DB+CMs+ICs) = knowledge bases (DB+CMs+ICs)
SyntaxSyntax
StructureStructure
SemanticsSemantics
System aspectsSystem aspects
reconciling reconciling SS44 heterogeneitiesheterogeneities
““gluing” together gluing” together multiple data sources multiple data sources
bridging information bridging information and knowledge gaps and knowledge gaps computationallycomputationally
9
Information Integration from a DB Perspective Information Integration from a DB Perspective
• Information Integration ProblemInformation Integration Problem– GivenGiven: data sources S: data sources S11, ..., S, ..., Skk (DBMS, web sites, ...) and user (DBMS, web sites, ...) and user
questions Qquestions Q11,..., Q,..., Qnn that can be answered using the S that can be answered using the Sii
– FindFind: the answers to Q: the answers to Q11, ..., Q, ..., Qnn
• The Database Perspective: source = “database” The Database Perspective: source = “database” SSii has a has a schemaschema (relational, XML, OO, ...) (relational, XML, OO, ...)
SSii can be queriedcan be queried
define virtual (or materialized) define virtual (or materialized) integrated viewsintegrated views V V over over SS11 ,..., S ,..., Skk using database query languages using database query languages (SQL, XQuery,...)(SQL, XQuery,...)
questions become queriesquestions become queries Q Qii against V(S against V(S11,..., S,..., Skk))
10
OutlineOutline
1.1. Information Integration from a Database PerspectiveInformation Integration from a Database Perspective
2.2. XML-Based Data Integration XML-Based Data Integration
3.3. Model-Based / Semantic MediationModel-Based / Semantic Mediation
4.4. DiscussionDiscussion
11
Extensible Markup Language (XML)Extensible Markup Language (XML)
• (meta)language for (meta)language for marking upmarking up text & datatext & data with with user-definable tagsuser-definable tags– (X)HTML, XSLT, XML Schema, ...(X)HTML, XSLT, XML Schema, ...– MathML, BioML, GeoML, NeuroML, ... MathML, BioML, GeoML, NeuroML, ... – XML-RPC, SOAP, WSDL, OWL, ... XML-RPC, SOAP, WSDL, OWL, ...
• semistructured tree data modelsemistructured tree data model– flexible: marked-up text, web-pages, flexible: marked-up text, web-pages,
databases, ...databases, ...
• container model: container model: – ““boxes within boxes”boxes within boxes”
• (meta)language for (meta)language for marking upmarking up text & datatext & data with with user-definable tagsuser-definable tags– (X)HTML, XSLT, XML Schema, ...(X)HTML, XSLT, XML Schema, ...– MathML, BioML, GeoML, NeuroML, ... MathML, BioML, GeoML, NeuroML, ... – XML-RPC, SOAP, WSDL, OWL, ... XML-RPC, SOAP, WSDL, OWL, ...
• semistructured tree data modelsemistructured tree data model– flexible: marked-up text, web-pages, flexible: marked-up text, web-pages,
databases, ...databases, ...
• container model: container model: – ““boxes within boxes”boxes within boxes”
... in their wonderful book called SemWeb Tractat by B. Schatz and T.B. Lee, the authors show how ...
author:“B. Schatz”
book:
title:“SemWeb Tractat”
author:“T.B. Lee”
book
title author
“SemWeb Tractat”
author
“B. Schatz”“T.B. Lee”
<book> <title>SemWeb Tractat</title> <author>B. Schatz</author> <author>T.B. Lee</author></book>
... in their wonderful book called <title>SemWeb Tractat </title> by B. Schatz and T.B. Lee, the authors show how ...
... in their wonderful book called <title>SemWeb Tractat</title> by <author>B. Schatz</author> and <author> T.B. Lee</author>, the authors show how ...
12
XML-Based Mediator ArchitectureXML-Based Mediator Architecture
MEDIATORMEDIATOR
XML Queries & Results
S1
Wrapper
XML View
S2
Wrapper
XML View
Sk
Wrapper
XML View
Integrated GlobalXML View G
Integrated ViewDefinition
G(..) S1(..)…Sk(..)
USER/ClientUSER/Client
Query Q ( G (SQuery Q ( G (S11,..., S,..., Skk) )) )
13
Some Challenges in XML-Based Integration ...Some Challenges in XML-Based Integration ...• XML Query/Transformation LanguagesXML Query/Transformation Languages
– DB communityDB community: QLs for semistructured data, e.g., : QLs for semistructured data, e.g., TSIMMIS/MSL, Lorel, Yatl, ..., TSIMMIS/MSL, Lorel, Yatl, ..., Florid/F-logicFlorid/F-logic [InfSystems98][InfSystems98]
– CSE/SDSCCSE/SDSC: : XMASXMAS [SSD99,SIGMOD99,WebDB99,EDBT00][SSD99,SIGMOD99,WebDB99,EDBT00]
– W3CW3C: XPath, XSLT, XQuery : XPath, XSLT, XQuery ((Working Draft , June 2001)Working Draft , June 2001)
• XML Schema LanguagesXML Schema Languages– DTDs, RELAX NG, XML Schema, ... DTDs, RELAX NG, XML Schema, ... [XMLDM02][XMLDM02]
• DB Theoreticians: DB Theoreticians: – Expressiveness/Complexity Trade-OffExpressiveness/Complexity Trade-Off
• queryingquerying: FO, (WF/S-)Datalog, FO(LFP), FO(PFP), ... , all: FO, (WF/S-)Datalog, FO(LFP), FO(PFP), ... , all
• reasoningreasoning: query satisfiability, containment, equivalence: query satisfiability, containment, equivalence
• ......
14
XMAS: XML Matching And Structuring language
Integrated View Definition:“Find books from amazon.com
and DBLP, join on author,group by authors and title”
CONSTRUCT <books> <book>
$a1$t<pubs>
$p { $p } </pubs>
</book> { $a1, $t } </books>WHERE <books.book>
$a1 : <author />$t : <title />
</> IN "amazon.com" AND <authors.author>
$a2 : <author /><pubs> $p : <pub/> </>
</> IN "www...DBLP… "AND value( $a1 ) = value( $a2 )
CONSTRUCT <books> <book>
$a1$t<pubs>
$p { $p } </pubs>
</book> { $a1, $t } </books>WHERE <books.book>
$a1 : <author />$t : <title />
</> IN "amazon.com" AND <authors.author>
$a2 : <author /><pubs> $p : <pub/> </>
</> IN "www...DBLP… "AND value( $a1 ) = value( $a2 )
XMASXMAS Algebra
[QL98,SIGMOD99] [EDBT00]
15
XML (XMAS) Query Processing
TranslatorTranslator
Rewriter/Optimizer: Q’(S)Rewriter/Optimizer: Q’(S)
composed plan
optimized plan
XML Query Q
Composition Q(G)Composition Q(G)
XML Global ViewDefinition G(S)
algebraic plans
Plan Execution Plan Execution
Compile-timeCompile-time
Run-time:query evaluationRun-time:query evaluation
16
……New Challenges in (XML-Based) MediationNew Challenges in (XML-Based) Mediation
• Global-As-View (GAV)Global-As-View (GAV)– user query Quser query Q global relations Gglobal relations G Q(G) Q(G) – global relations Gglobal relations G source relations S G(S) source relations S G(S)– challenge: compute answers challenge: compute answers Q(G(V(S)))Q(G(V(S))) withoutwithout computing all of computing all of VV and and GG query rewriting (with limited source capabilities)query rewriting (with limited source capabilities): : Q’(S) = Q(G)Q’(S) = Q(G)
• Local-As-View (LAV) Local-As-View (LAV) – user query Q user query Q global relations Gglobal relations G Q(G)Q(G)– source relations S source relations S global relations G global relations G S(G)S(G)– challenge: “reverse/rewrite rules” from challenge: “reverse/rewrite rules” from S(G) S(G) to some to some G’(S)G’(S) answering queries using views: answering queries using views: equivalent rewritings may not existequivalent rewritings may not exist find maximally contained ones: find maximally contained ones: Q’(G’(S)) Q’(G’(S)) Q(G) Q(G)
• Inter(CS)disciplinary research needed: DB Inter(CS)disciplinary research needed: DB FP FP LP LP – GAV/LAV GAV/LAV view (un)folding view (un)folding Clark’s completion, resolution, factoring Clark’s completion, resolution, factoring
17
Querying XML Streams: A New FrontierQuerying XML Streams: A New Frontier
• New applications for stream-based XML processing: New applications for stream-based XML processing: – Continuous, real-time data streams (wireless sensor networks, …)Continuous, real-time data streams (wireless sensor networks, …)
– Data / message transformation in Web services (SOAP, RMI, processing …)Data / message transformation in Web services (SOAP, RMI, processing …)
– Extract-transform-load applications (Tera/Peta-byte archival migration, …)Extract-transform-load applications (Tera/Peta-byte archival migration, …)
• … … leading to a new XML querying & transformation paradigm:leading to a new XML querying & transformation paradigm:– how to execute (some) XML queries & transformations on very large (infinite) how to execute (some) XML queries & transformations on very large (infinite)
data streams using only limited memorydata streams using only limited memory
– XML stream machine (XSM): extended XML transducers with buffersXML stream machine (XSM): extended XML transducers with buffers
XQueryXQuery XSM networkXSM network
XSMs clearly outperform tree-based approaches XSMs clearly outperform tree-based approaches on streamable queries (100x over Xalan) on streamable queries (100x over Xalan) [A Transducer-Based XML Query Processor, Ludäscher [A Transducer-Based XML Query Processor, Ludäscher Mukhopadhyay, Papakonstantinou, VLDB’02]Mukhopadhyay, Papakonstantinou, VLDB’02]
18
OutlineOutline
1.1. Information Integration from a Database PerspectiveInformation Integration from a Database Perspective
2.2. XML-Based Data Integration XML-Based Data Integration
3.3. Model-Based / Semantic MediationModel-Based / Semantic Mediation
4.4. DiscussionDiscussion
A Neuroscientist’s Information Integration ProblemA Neuroscientist’s Information Integration Problem
What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity?
How about other rodents?
?Information Integration
?Information Integration
protein localization(NCMIR)
protein localization(NCMIR)
neurotransmission(SENSELAB)
neurotransmission(SENSELAB)
sequence info(CaPROT)
sequence info(CaPROT) morphometry
(SYNAPSE)
morphometry(SYNAPSE)
“Complex Multiple-Worlds”
Mediation
“Complex Multiple-Worlds”
Mediation
A Geoscientist’s Information Integration ProblemA Geoscientist’s Information Integration Problem
What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ?
How does it relate to host rock structures?
?Information Integration
?Information Integration
Geologic Map(Virginia)
Geologic Map(Virginia) GeoChemicalGeoChemical GeoPhysical
(gravity contours)
GeoPhysical(gravity contours)
GeoChronologic(Concordia)
GeoChronologic(Concordia)
Foliation Map(structure DB)
Foliation Map(structure DB)
“Complex Multiple-Worlds”
Mediation
“Complex Multiple-Worlds”
Mediation
21
What’s the Problem with XML & Complex Multiple-Worlds?What’s the Problem with XML & Complex Multiple-Worlds?
• XML is XML is SyntaxSyntax– ... for labeled ordered trees... for labeled ordered trees
– ... all ... all semantics lies outsidesemantics lies outside of XML of XML• XML DTDs => tags + nestingXML DTDs => tags + nesting
• XML Schema => DTDs + data modeling XML Schema => DTDs + data modeling
• need anything else? => need anything else? => write comments!write comments!
• Domain Semantics is Domain Semantics is ComplexComplex::– implicitimplicit assumptions, assumptions, hiddenhidden semantics semantics sources sources seem unrelatedseem unrelated to the non-expert to the non-expert
• Need Structure and Semantics Need Structure and Semantics beyond treesbeyond trees!! employ employ richer OO modelsricher OO models make domain make domain semanticssemantics and “ and “glue knowledgeglue knowledge” ” explicitexplicit use use ontologiesontologies to fix terminology and conceptualization to fix terminology and conceptualization avoid ambiguities by using avoid ambiguities by using KR and formal semanticsKR and formal semantics
22
DB mediation techniques
OntologiesKR formalisms
Model-Based Mediation
Information Integration LandscapeInformation Integration Landscape
conceptual distanceone-world multiple-worlds
conceptual complexity/depth
low
high
addallbook-buyer
BLAST
EcoCyc
Cyc
WordNet
GO
home-buyer24x7 consumer
UMLS
MIA Entrez
RiboWeb
Tambis
BioinformaticsGeo-, Ecoinformatics
XML-Based vs. Model-Based MediationXML-Based vs. Model-Based Mediation
Raw DataRaw DataRaw Data
IF THEN IF THEN IF THEN
LogicalDomainConstraints
Integrated-CM CM-QL(Src1-CM,...)
Integrated-CM CM-QL(Src1-CM,...)
. . ....
....
........ (XML)Objects
Conceptual Models
XMLElements
XML Models
C2 C3
C1
R
Classes,Relations,is-a, has-a, ...
“Glue Maps” = Domain & Process Maps (ontologies)
“Glue Maps” = Domain & Process Maps (ontologies)
Integrated-DTD XML-QL(Src1-DTD,...)
Integrated-DTD XML-QL(Src1-DTD,...)
No DomainConstraints
A = (B*|C),DB = ...
Structural Constraints (DTDs),Parent, Child, Sibling, ...
CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …} CM-QL ~ {F-Logic, DAML+OIL, …}
24
What’s the Glue? What’s in a Link? What’s the Glue? What’s in a Link? • Syntactic Joins Syntactic Joins
(X,Y) := X.SSN (X,Y) := X.SSN == Y.SSN Y.SSN equalityequality (X,Y) := X.UMLS-ID (X,Y) := X.UMLS-ID == Y.UID Y.UID
• ““Speciality” JoinsSpeciality” Joins (X,Y,Score) := (X,Y,Score) := BLASTBLAST(X,Y,Score)(X,Y,Score) similaritysimilarity
• Semantic/Rule-Based JoinsSemantic/Rule-Based Joins (X,Y,C) := (X,Y,C) :=
X X isaisa C, Y C, Y isaisa C, C, BLASTBLAST(X,Y,S),(X,Y,S), S>0.8S>0.8 homology, lubhomology, lub (X,Y,[produces,B,increased_in]) := (X,Y,[produces,B,increased_in]) :=
X X produces produces B, B B, B increased_in increased_in YY. . rule-basedrule-based
e.g., X=e.g., X=--secretase, B=beta amyloid, Y=Alzheimer’s diseasesecretase, B=beta amyloid, Y=Alzheimer’s disease
• CS Challenge: CS Challenge: – compile semantic joins into efficient syntactic onescompile semantic joins into efficient syntactic ones
XY
25
Semantic Mediation Methodology @ Semantic Mediation Methodology @ SOURCESSOURCES
• Lift Sources to export CMs: Lift Sources to export CMs:
CM(CM(SS) = OM() = OM(SS) + KB() + KB(SS) + CON() + CON(SS) )
• Object Model OM(Object Model OM(SS):):– complex objects (frames), class hierarchy, OO constraintscomplex objects (frames), class hierarchy, OO constraints
• Knowledge Base KB(Knowledge Base KB(SS):):– explicit representation of (“hidden”) source semantics explicit representation of (“hidden”) source semantics
– logic ruleslogic rules over OM( over OM(SS))
• Contextualization CON(Contextualization CON(SS):):– situatesituate OM( OM(SS) data using “glue maps” (ontologies):) data using “glue maps” (ontologies): domain maps DMs domain maps DMs
= = terminological knowledgeterminological knowledge: : conceptsconcepts + + rolesroles process maps PMsprocess maps PMs
= = “procedural knowledge“procedural knowledge”: ”: statesstates + + transitionstransitions
26
Semantic Mediation Methodology @ Semantic Mediation Methodology @ MEDIATORMEDIATOR
• Integrated View Definition (IVD)Integrated View Definition (IVD)– declarative (logic) rules with object-oriented featuresdeclarative (logic) rules with object-oriented features
– defined over CM(defined over CM(SS), domain maps, process maps), domain maps, process maps
– needs “needs “mediation engineersmediation engineers” = domain + KRDB experts” = domain + KRDB experts
• Knowledge-Based Querying and Browsing (runtime):Knowledge-Based Querying and Browsing (runtime):– mediator composes the user query Q with the IVDmediator composes the user query Q with the IVD
... rewrites (Q o IVD), sends subqueries to sources... rewrites (Q o IVD), sends subqueries to sources
... post-processes returned results (e.g., ... post-processes returned results (e.g., situate in contextsituate in context))
27
S1 S2
S3
(XML-Wrapper) (XML-Wrapper) (XML-Wrapper)
CM-Wrapper CM-Wrapper CM-Wrapper
USER/ClientUSER/Client
CM (Integrated View)
MediatorEngine
FL rule proc.
LP rule proc.
Graph proc.XSB Engine
CM(S) =OM(S)+KB(S)+CON(S)
GCM
CM S1
GCM
CM S2
GCM
CM S3
CM Queries & Results (exchanged in XML)
Domain MapsDMs
Domain MapsDMs
Domain MapsDMs
Domain MapsDMs
Domain MapsDMs
Process MapsPMs
“Glue” MapsGMs
semanticcontextCON(S)
Integrated View Definition IVD
Model-Based Mediator Architecture
First results & Demos:KIND prototype, formal
DM semantics, PMs[SSDBM00] [VLDB00][ICDE01] [NIH-HB01]
[BNCOD02] [ER02][EDBT02] [BioInf02]
28
Domain Map = labeled graph with concepts ("classes") and roles ("associations")• additional semantics: expressed as logic rules (F-logic)
Domain Map = labeled graph with concepts ("classes") and roles ("associations")• additional semantics: expressed as logic rules (F-logic)
Domain Map (DM)
Purkinje cells and Pyramidal cells have dendritesthat have higher-order branches that contain spines.Dendritic spines are ion (calcium) regulating components.Spines have ion binding proteins. Neurotransmissioninvolves ionic activity (release). Ion-binding proteinscontrol ion activity (propagation) in a cell. Ion-regulatingcomponents of cells affect ionic activity (release).
Domain Expert Knowledge
DM in Description Logic
Formalizing Glue Knowledge:Formalizing Glue Knowledge:Domain Map for Domain Map for SYNAPSESYNAPSE and and NCMIRNCMIR
29
Source Contextualization & DM RefinementSource Contextualization & DM Refinement
In addition to registering (“hanging off”) data relative toexisting concepts, a source may also refine the mediator’s domain map...
sources can register new concepts at the mediator ...
Example:Example:ANATOM Domain MapANATOM Domain Map
31
Browsing Registered Data with Domain MapsBrowsing Registered Data with Domain Maps
Query Processing Query Processing DemoDemo
Query resultsin context
ContextualizationCON(Result) wrt. ANATOM.
Mediator View DefinitionMediator View DefinitionDERIVEDERIVE
protein_distributionprotein_distribution((ProteinProtein, , Organism,Organism,Brain_region, Brain_region, Feature_name, Feature_name, Anatom,Anatom, ValueValue) ) WHEREWHERE
I:I:protein_label_image[protein_label_image[ proteins ->> {Protein}; organism -> Organism; anatomical_structures ->> proteins ->> {Protein}; organism -> Organism; anatomical_structures ->>{AS:{AS:anatomical_structure[anatomical_structure[name->Anatomname->Anatom]]}}] ] , , % from PROLAB% from PROLAB
NAE:NAE:neuro_anatomic_entity[neuro_anatomic_entity[name->Anatom; name->Anatom; % from ANATOM% from ANATOM located_in->>{Brain_region}located_in->>{Brain_region}]], , AS..segments..featuresAS..segments..features[[name->Feature_name; value->Valuename->Feature_name; value->Value]]. .
• provided by the domain expert and mediation engineer• deductive OO language (here: F-logic)
Example: Inside Query EvaluationExample: Inside Query Evaluation
push selectionpush selection@SENSELAB@SENSELAB: X1 := : X1 := selectselect targets of “output from targets of “output from parallel fiber”parallel fiber” ;;
determine source contextdetermine source context@MEDIATOR@MEDIATOR: X2 := : X2 := ““find and situatefind and situate”” X1 in ANATOM X1 in ANATOM Domain MapDomain Map;;
compute region of interest (here: downward closure)compute region of interest (here: downward closure)@MEDIATOR@MEDIATOR: X3 := : X3 := subregion-closuresubregion-closure(X2);(X2);
push selectionpush selection @NCMIR@NCMIR: X4 := : X4 := selectselect PROT-data(X3, PROT-data(X3, Ryanodine ReceptorsRyanodine Receptors););
compute protein distributioncompute protein distribution @MEDIATOR@MEDIATOR: X5 := : X5 := compute aggregatecompute aggregate(X4);(X4);
display in contextdisplay in context @MEDIATOR/GUI@MEDIATOR/GUI: : displaydisplay X5 X5 inin context context (ANATOM)(ANATOM)
"How does the parallel fiber output (Yale/SENSELAB) relate to the
distribution of Ryanodine Receptors (UCSD/NCMIR)?”
=> DEMONSTRATION
34
Open Database & Knowledge Representation IssuesOpen Database & Knowledge Representation Issues
• Mix of Query Processing and ReasoningMix of Query Processing and Reasoning– GAV & LAV with semantic query optimization (NIH BIRN, NSF GEON)GAV & LAV with semantic query optimization (NIH BIRN, NSF GEON)– description logic reasoner for DMs (FaCT) ?description logic reasoner for DMs (FaCT) ?– reconciliation of conflicting DMs via reconciliation of conflicting DMs via argumentation-frameworksargumentation-frameworks (“games”) (“games”)
using using well-foundedwell-founded and and stable modelsstable models of logic programs [ICDT97, PODS97, of logic programs [ICDT97, PODS97, TCS00, TODS02]TCS00, TODS02]
• Modeling “Process Knowledge” => Process MapsModeling “Process Knowledge” => Process Maps– formal semantics? (dynamic/temporal/Kripke models/Petri nets?)formal semantics? (dynamic/temporal/Kripke models/Petri nets?)– executable semantics? (Statelog?)executable semantics? (Statelog?)
• Graph Queries over DMs and PMsGraph Queries over DMs and PMs– expressible in F-logic [InfSystem98]expressible in F-logic [InfSystem98]– scalability? (UMLS Domain Map has millions of entries)scalability? (UMLS Domain Map has millions of entries)
• How to incorporate “procedural features”?How to incorporate “procedural features”?– Bioinformatics, Ecoinformatics, … => sources = DBs + analytical tools + …Bioinformatics, Ecoinformatics, … => sources = DBs + analytical tools + … scientific workflow planning and management (“promoter identification scientific workflow planning and management (“promoter identification
workflow” for DOE SciDAC, NSF/ITR SEEK)workflow” for DOE SciDAC, NSF/ITR SEEK)
35
Process Maps with Process Maps with AbstractionsAbstractions and and ElaborationsElaborations:: From Terminological to From Terminological to Procedural GlueProcedural Glue
• nodes ~ states• edges ~ processes, transitions• blue/red edges:
• processes in Src1/Src2• general form of edges:
related formalisms
36
A Scientific Workflow: A Scientific Workflow: Promoter IdentificationPromoter Identification
Questions:Are chr#’s in common?Are chr#’s locations in common?Are there conserved upstream sequences?Are gene locations conserved across species
Questions: RNA POLII promoter?GpC Island present?Are there common TAF’s across genomic gi#?
Questions: Are there other common genes?
gi#’s from clusfavor
cDNA gi#Gene name
blast
blast human
Genomic gi#Chr #
Gene location
TAF’sLocation on Genomic gi#’s
Probabilities of matchProbabilities of random match
TRANSFAC
GC Island locationExon/intron location
Repeats locationPromoter location
GRAIL
Validates polII promoter location
promoter locationShared TAF’s across clusterCommon consensus sequence
Data Consolidation
Consensus sequences
CLUSTAL
blast other species
Genomic gi#Chr #
Gene location
blast
Matthew Coleman, LLNL, 2002
Genomic gi# cDNA gi#
blast
CLUSTAL
TRANSFAC
37
SDM Demo & ArchitectureSDM Demo & Architecture
Translation Approach:Abstract Workflow (AWF) => Executable Workflow (EWF)
Translation Approach:Abstract Workflow (AWF) => Executable Workflow (EWF)
38
Analytical Pipelines: An Open Source ToolAnalytical Pipelines: An Open Source Tool
39
A Commercial Tool for Analytical PipelinesA Commercial Tool for Analytical Pipelines
40
Summary: Mediation Scenarios & TechniquesSummary: Mediation Scenarios & Techniques
Federated Databases XML-Based Mediation Model-Based Mediation
One-World One-/Multiple-Worlds Complex Multiple-Worlds
Common Schema Mediated Schema Common Glue Maps
SQL, rules XML query languages DOOD query languages
Schema Transformations Syntax-Aware Mappings Semantics-Aware Mappings
Syntactic Joins Syntactic Joins “Semantic” Joins via Glue Maps
DB expert DB expert KRDB + domain experts
Glue?Glue?
41
GEON vs. SEEKGEON vs. SEEK
42
OutlineOutline
1.1. Information Integration from a Database PerspectiveInformation Integration from a Database Perspective
2.2. XML-Based Data Integration XML-Based Data Integration
3.3. Model-Based / Semantic MediationModel-Based / Semantic Mediation
4.4. DiscussionDiscussion
43
Thank you!Thank you!
Questions? Queries?
44
Some References Some References • Model-Based Mediation:Model-Based Mediation:
– A Model-Based Mediator System for Scientific Data ManagementA Model-Based Mediator System for Scientific Data Management, B. Ludäscher, A. Gupta, M. , B. Ludäscher, A. Gupta, M. Martone, Martone, Bioinformatics: Managing Scientific DataBioinformatics: Managing Scientific Data , Lacroix, Critchlow (eds), Morgan , Lacroix, Critchlow (eds), Morgan Kaufmann, to appear, 2003Kaufmann, to appear, 2003
– Model-Based Mediation with Domain MapsModel-Based Mediation with Domain Maps, B. Ludäscher, A. Gupta, M. E. Martone, , B. Ludäscher, A. Gupta, M. E. Martone, 17th Intl. 17th Intl. Conference on Data EngineeringConference on Data Engineering (ICDE’01)(ICDE’01), Heidelberg, Germany, IEEE Computer Society, , Heidelberg, Germany, IEEE Computer Society, 2001. 2001.
– Managing Managing SemistructuredSemistructured Data with FLORID: A Deductive Object-Oriented Perspective Data with FLORID: A Deductive Object-Oriented Perspective, B. , B. Ludäscher, R. Himmeröder, G. Lausen, W. May, C. Schlepphorst, Ludäscher, R. Himmeröder, G. Lausen, W. May, C. Schlepphorst, Information Systems, 23(8), Special Issue onInformation Systems, 23(8), Special Issue on Semistructured Semistructured Data Data, 1998. , 1998.
• XML-Based Mediation:XML-Based Mediation:– VXD/Lazy MediatorsVXD/Lazy Mediators: : Navigation-Driven Evaluation of Virtual Mediated ViewsNavigation-Driven Evaluation of Virtual Mediated Views, B. Ludäscher, , B. Ludäscher,
Y. Papakonstantinou, P. Velikhov, Y. Papakonstantinou, P. Velikhov, Intl. Conference on Extending Database TechnologyIntl. Conference on Extending Database Technology (EDBT’00)(EDBT’00), Konstanz, Germany, LNCS 1777, Springer, 2000. , Konstanz, Germany, LNCS 1777, Springer, 2000.
– XML StreamsXML Streams: : A Transducer-Based XML Query ProcessorA Transducer-Based XML Query Processor, B. Ludäscher, P. Mukhopadhyay, , B. Ludäscher, P. Mukhopadhyay, Y. Papakonstantinou, Y. Papakonstantinou, Intl. Conference on Very Large Databases Intl. Conference on Very Large Databases (VLDB’02), Hong Kong, 2002(VLDB’02), Hong Kong, 2002
45
Knowledge Representation:Knowledge Representation:Relating Theory to the World via Formal ModelsRelating Theory to the World via Formal Models
John F. Sowa, Knowledge Representation: Logical, Philosophical, and Computational Foundations
“All models are wrong, but some are useful!”