Page 1Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Composing Mappings between Schemas using a Reference Ontology
Composing Mappings between Schemas using a Reference Ontology
Eduard Dragut, Ramon LawrenceEduard Dragut, Ramon LawrenceIowa Database and Emerging Applications Iowa Database and Emerging Applications
(IDEA) Laboratory(IDEA) LaboratoryUniversity of IowaUniversity of Iowa
{eduard-dragut, ramon-lawrence}@uiowa.edu{eduard-dragut, ramon-lawrence}@uiowa.edu
Eduard Dragut, Ramon LawrenceEduard Dragut, Ramon LawrenceIowa Database and Emerging Applications Iowa Database and Emerging Applications
(IDEA) Laboratory(IDEA) LaboratoryUniversity of IowaUniversity of Iowa
{eduard-dragut, ramon-lawrence}@uiowa.edu{eduard-dragut, ramon-lawrence}@uiowa.edu
Page 2Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Outline Motivation Integration Approach Background Architecture Overview Ontological Matching Composing Mappings Global View Construction Experimental Results Future Work and Conclusions
Page 3Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Motivation Many organizations have pre-existing ontologies that
are not suitable as global views but are suitable as reference ontologies to aid integration.
Example: National Cancer Institute (NCI) and National Insitutes of Health (NIH) have caBIG grid prototype which standardizes terminology (EVS, caDSR) and data elements in cancer domain.
Schema-to-ontology matching requires integrators understand only their schema instead of all schemas that they may want to integrate.
Integration Approach
NCBIDatabase
Schema-to-ontology mapping
ReferenceOntology
Schemamatching
ExpressionDatabase
Schemamatching
Schema-to-ontology mapping
Compose &Merge
Global View
User QueriesPage 4Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Page 5Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Background:Ontologies and Integration Ontologies as the integrated, global view
Carnot project (Collet91) with Cyc ontology (Lenat90) ONTOBROKER (Decker98), OBSERVER (Mena00)
Tools for semi-automatically merging ontologies PROMPT (Noy00), Ontobuilder (Gal04)
Use ontologies as matching/integration aids MOMIS (Beneventano03) using WordNet Indirect (Xu03), CUPID (Madhavan01), COMA (Do02)
Matching ontologies (Doan02) “Discovering” ontologies (Madhavan03)
Corpus-based matching
Page 6Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Background:Model Management Model management as proposed by (Bernstein03) is
intended to allow high-level schema operations. Operators include: Invert, Compose, Match, Merge. Warning: Semantics of all operators are not yet fully
defined and some of them are not completely automatic.
Definitions: A match is a semantic correspondence between schema
elements. A mapping between schema elements is an expression
that relates the elements. Note that most schema matching systems such as COMA
produce matches not mappings.
Page 7Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Architecture Overview We assume the existence of a pre-existing reference
ontology that has been “accepted” in a domain. The ontology is NOT a global view and may not cover the
information in all schemas. It cannot be edited.
Global view construction is a 3-step process: 1) Independently match each schema to the ontology. 2) Compose schema-to-ontology matches to produce
schema-to-schema mappings. 3) Merge the schema mappings to produce the global view.
The challenge is to automate this as much as possible.
Page 8Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Benefits of Approach Even with manual integration there are several
benefits to using a reference ontology: 1) An integrator must only understand their schema and
the ontology and not other schemas to be integrated. 2) Most validation is performed once during schema-to-
ontology matching and not for every schema integrated. 3) Schema-to-ontology matchings can be re-used every
time a new schema is integrated into the federation.
Automation can: 1) Help construct schema-to-ontology matchings. 2) Perform composition of mappings. 3) Build a global view from the composed mappings.
Page 9Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Automation Challenges There are several challenges in automating this
process: 1) Schema matching systems such as COMA are
designed for simpler relational schemas. Ontologies must be mapped into a suitable format for use with COMA.
2) Schema-to-ontology matching is less accurate due to more complicated ontological structure and because the ontology may not model the entire domain or may model it differently.
3) Composing matchings often results in many false matches which must be handled.
4) A method for merging schemas using model management primitive operators is required. **Even with these operators, Merge is not fully automatic.
Page 10Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Background:COMA COMA (Do02) is a schema matching system that
can flexibly combine different match algorithms and re-use match results.
Match algorithms use names, paths, and schema properties in various ways.
The mapping format between two schemas R and S is a triple (r,s,v) where r in R, s in S, and v is the similarity value in [0..1] between elements r and s.
A schema in COMA is represented as a rooted directed acyclic graph. Schema elements are nodes which may be connected by links of different types.
Page 11Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Ontological Matching The first step is to convert ontologies in OWL/DAML
format into COMA’s graph representation format. Wrote a program that used the JENA parser.
During the conversion: 1) Explicitly converted a named relationship in the
ontology into a node and several edges in graph. 2) Explicitly encoded attributes inherited over IS-A links
since COMA does not support IS-A.
After conversion, COMA would automatically produce a schema-to-ontology match as it would appear to be matching two relational schemas.
Page 12Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Converting Ontology to a GraphConverting Named Relationships Making IS-A Explicit
* Also create a single root POOntology as required by COMA.
Page 13Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Ontological Matching:Max versus noMax One challenge is what should this match look like?
Two choices: 1) Max - For each schema element, keep the best match
with the ontology (if any). 2) NoMax - For each schema element, keep all the
matches that are above the cutoff threshold.
Since Max only generates one match, it is probably the best in semi-automated settings. NoMax will generate many matches which must be filtered out by the user or during composition.
Page 14Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Composing Mappings Schema-to-ontology mappings must be composed
to produce direct schema-to-schema mappings. Since mappings carry no semantics, two objects are
assumed to be identical if they map to the same ontological concept. Composition is performed transitively and is implemented using a natural join.
That is, if element r is similar to o and o is similar to s, then we assume that r is similar to s.
For example: <postalCode,Zip,0.8> and <Zip, postCode,0.7> can be
composed to yield <postalCode,postCode,0.75>. The similarity values may be combined using various
functions, although average is the most common.
Page 15Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Composition ExampleS1
ContactCompanyNameEmailNamePosition
S2Contact
FirstName
LastName
EmailPosition
O
contact
PersonFirstNameLastName
Organization
name
ComposeS1
ContactCompanyNameEmailNamePosition
S2Contact
FirstName
LastName
EmailPosition
Page 16Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Global View Construction One of the possible applications of constructing
schema-to-schema mappings in this way is using them to build a global view.
We have given a script in the paper that uses model management operators to compose any number of schema-to-ontology mappings into a single global view for all sources.
Note that this algorithm is not perfect nor fully automatic as the mappings are not perfect and the Merge operator may require human intervention.
Page 17Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Global View Construction Example
Page 18Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Experimental Setup Matched the 5 sample order schemas: CIDR, Excel,
Noris, Paragon, and Apertum used to evaluate COMA. Numbered these schemas 1, 2, 3, 4, and 5.
Created a reference ontology that models some of the domain (but not all of it) and is quite different than the schemas (uses IS-A for example).
Used the matchings specified with COMA as ground-truth.
Evaluation metrics: Precision - # of correct matches/# of suggested matches Recall - # of correct matches returned/# total matches Overall = Recall * (2 - 1 / Precision)
Page 19Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Reference Order Ontology
Page 20Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Experiment #1:Schema-to-Ontology Matching Goal: Evaluate the accuracy of schema-to-ontology
matching. Method:
Automatically convert ontology into COMA format and match each schema with ontology.
Evaluation: Measured the percent overlap of the schema and ontology.
For many schemas, only 60% of their concepts were in the ontology.
Evaluated the precision, recall, and overall measures relative to the number of matches that could be found. E.g. If overlap was 60% and recall was 50%, then only 30% of all
schema elements were matched BUT of all the possible matches, 50% were found.
Page 21Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Experiment #1: Results
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5Pre
cis
ion
/Re
ca
ll/O
ve
rall
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
On
tolo
gy
Ov
erl
ap
Max Precision noMax Precision Max Recall noMax Recall Max Overall noMax Overall Ontology Overlap
* noMax is poor for schema 5 as Buyer incorrectly matched to ontology.
Page 22Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Experiment #2:Schema-to-Schema Mappings Goal: Determine the accuracy of producing schema-
to-schema mappings by composing schema-to-ontology matchings.
Method: Used automatically generated schema-to-ontology
matchings and composed them. Evaluated composition result against COMA answers for direct matching.
Evaluated noMax and Max techniques and manual mappings.
Page 23Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Experiment #2: Results (Overall)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1<->2 1<->3 1<->4 1<->5 2<->3 2<->4 2<->5 3<->4 3<->5 4<->5Match Tasks
Ov
era
ll
Max noMax COMA Manual
* 1 <-> 2 is poor because of Street mapping.* 4 <-> 5 is poor because of Buyer mapping.
Page 24Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Experiment #3:Improving Direct Matches Goal: Determine if the accuracy of producing direct
schema-to-schema mappings can be improved by re-using schema-to-ontology matches.
Method: Generate schema-to-schema mappings by composing
schema-to-ontology matchings and then use this as past matching information for COMA.
Allow COMA to perform direct match given this information.
Evaluated noMax and Max techniques and manual mappings.
Page 25Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Experiment #3: Results (Overall)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1<->2 1<->3 1<->4 1<->5 2<->3 2<->4 2<->5 3<->4 3<->5 4<->5Match Tasks
Ov
era
ll
Re-use Max Re-use noMax COMA Re-use Manual
* 1 <-> 2 is poor because of Street mapping.
Page 26Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Discussion and Conclusions Major findings:
1) Schema-to-ontology mappings can be constructed with good accuracy (70-80% precision, 60% recall).
2) The composition of schema-to-ontology matchings produces similar results to direct matching with COMA.
3) Max has higher precision than noMax but with lower recall. Max is probably best when the user must filter incorrect matches and always saves work.
4) It is valuable to re-use schema-to-ontology matchings (either automatic or manually constructed) to improve the accuracy of direct matchings.
Major conclusion: There is a benefit to building semi-automatic schema-to-ontology matchings for use in integration and global view construction.
Page 27Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Future Work and Challenges The major challenge is that the mappings carry no
semantics which often results in incorrect matches suggested after composition.
We are currently working on extending the mappings to capture semantics to avoid many of these cases.
The approach is not fully automatic (nor will it ever be). However, most manual work is in the schema-to-ontology matching stage. We need better algorithms and tools to support this matching.
Want to perform experimental evaluation on larger ontologies such as those from NCI.
Issue: Many ontologies are not in suitable form for intermediate mapping with schemas. (just taxonomies)
Page 28Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Composing Mappings between Schemas using a Reference Ontology
Composing Mappings between Schemas using a Reference Ontology
Eduard Dragut, Ramon LawrenceEduard Dragut, Ramon LawrenceIowa Database and Emerging Applicatons Iowa Database and Emerging Applicatons
(IDEA) Laboratory(IDEA) LaboratoryUniversity of IowaUniversity of Iowa
{eduard-dragut, ramon-lawrence}@uiowa.edu{eduard-dragut, ramon-lawrence}@uiowa.edu
Eduard Dragut, Ramon LawrenceEduard Dragut, Ramon LawrenceIowa Database and Emerging Applicatons Iowa Database and Emerging Applicatons
(IDEA) Laboratory(IDEA) LaboratoryUniversity of IowaUniversity of Iowa
{eduard-dragut, ramon-lawrence}@uiowa.edu{eduard-dragut, ramon-lawrence}@uiowa.edu
Page 29Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Extra Slides
Extra Slides...
Page 30Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Ontology Conversion Algorithm 1) Each ontology concept (class) becomes a node in
the graph. 2) For each property (attribute) of a class, add a
node to the graph and connect it to its class. 3) Non-basetype properties (those with domain and
range in ontology) are converted by: 3a) Creating a node in the graph for the relationship. 3b) Adding an edge from the class domain to this node. 3c) Adding an edge from the new node to the range class. Note: Do not currently support properties that have a
domain or range that is union/intersection of concepts.
4) IS-A expanded by graph traversal.
Page 31Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Mapping Composition Challenges
Composing N:1 match with 1:N match results in a cross-product
Cannot handle these cases as mappings have no semantics.
Page 32Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Global View Construction Script
Operator GlobalView(ArraySchemas, ArrayMappings, O, n)// ArraySchemas stores the n schemas// ArrayMappings stores the n schema-to-ontology mappings1. If n <= 0 Then Return empty schema;2. If n == 1 Then Return ArraySchemas[0];3. S1 = ArraySchemas[0];4. S2 = ArraySchemas[1];5. map1 = ArrayMappings[0];6. map2 = ArrayMappings[1];7. < S, map > = GlobalView2(S1, S2, map1, map2, O);8. For (i=2; i <= n-1; i++) 9. S1 = S; 10. map1 = map;11. S2 = ArraySchemas[i];12. map2 = ArrayMappings[i];13. < S, map > = GlobalView2(S1, S2, map1, map2, O);14. end for;15. Return < S, map >;
Computes Global View of N Source Schemas (with ontology mappings)
Page 33Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Global View Construction Script (2)
Operator GlobalView2(S1, S2, O, S1_O, S2_O)1. S1_S2 = S1_O * Invert(S2_O)2. < M, S1_M, S2_M > = Merge(S1, S2, S1_S2);3. M_O = Invert(S1_M) * S1_O + Invert(S2_M) * S2_O;4. Return < M, M_O >;
Computes Global View of Two Source Schemas (with ontology mappings)
Page 34Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Sample Order SchemaExcel XML Schema<?xml version="1.0"?><Schema name="PurchaseOrder.biz" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes">
<ElementType name="PurchaseOrder" content="eltOnly"><element type="Header"/><element type="Items"/><element type="Footer"/><element type="InvoiceTo"/><element type="DeliverTo"/>
</ElementType><ElementType name="Items" content="eltOnly"><AttributeType name="itemCount" dt:type="int"></AttributeType><attribute type="itemCount"/><element type="Item" maxOccurs="*" minOccurs="1"/>
</ElementType><ElementType name="Item" content="empty">
<AttributeType name="yourPartNumber" dt:type="string"></AttributeType><AttributeType name="unitPrice" dt:type="number"></AttributeType><AttributeType name="unitOfMeasure" dt:type="string"></AttributeType><AttributeType name="salesValue" dt:type="number"></AttributeType><AttributeType name="quantity" dt:type="number"></AttributeType><AttributeType name="partNumber" dt:type="string"></AttributeType><AttributeType name="partDescription" dt:type="string"></AttributeType><AttributeType name="itemNumber" dt:type="int"></AttributeType>
Page 35Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Sample Order SchemaExcel XML Schema (2)
<attribute type="itemNumber"/><attribute type="yourPartNumber"/><attribute type="partNumber"/><attribute type="partDescription"/><attribute type="quantity"/><attribute type="unitOfMeasure"/><attribute type="unitPrice"/><attribute type="salesValue"/>
</ElementType><ElementType name="InvoiceTo" content="eltOnly">
<element type="Contact"/><element type="Address"/>
</ElementType><ElementType name="Header" content="eltOnly">
<AttributeType name="yourAccountCode" dt:type="string"></AttributeType><AttributeType name="ourAccountCode" dt:type="string"></AttributeType><AttributeType name="orderNum" dt:type="string"></AttributeType><AttributeType name="orderDate" dt:type="date"></AttributeType><attribute type="orderNum"/><attribute type="orderDate"/><attribute type="ourAccountCode"/><attribute type="yourAccountCode"/><element type="Contact"/>
</ElementType>
Page 36Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Sample Order SchemaExcel XML Schema (3)
<ElementType name="Footer" content="empty"><AttributeType name="totalValue" dt:type="number"></AttributeType><attribute type="totalValue"/>
</ElementType><ElementType name="DeliverTo" content="eltOnly">
<element type="Contact"/><element type="Address"/>
</ElementType><ElementType name="Contact" content="empty">
<AttributeType name="telephone" dt:type="string"></AttributeType><AttributeType name="e-mail" dt:type="string"></AttributeType><AttributeType name="contactName" dt:type="string"></AttributeType><AttributeType name="companyName" dt:type="string"></AttributeType><attribute type="contactName"/><attribute type="companyName"/><attribute type="e-mail"/><attribute type="telephone"/>
</ElementType>
Page 37Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Sample Order SchemaExcel XML Schema (4)
<ElementType name="Address" content="empty"><AttributeType name="street4" dt:type="string"></AttributeType><AttributeType name="street3" dt:type="string"></AttributeType><AttributeType name="street2" dt:type="string"></AttributeType><AttributeType name="street1" dt:type="string"></AttributeType><AttributeType name="stateProvince" dt:type="string"></AttributeType><AttributeType name="postalCode" dt:type="string"></AttributeType><AttributeType name="country" dt:type="string"></AttributeType><AttributeType name="city" dt:type="string"></AttributeType><attribute type="street1"/><attribute type="street2"/><attribute type="street3"/><attribute type="street4"/><attribute type="city"/><attribute type="stateProvince"/><attribute type="postalCode"/><attribute type="country"/>
</ElementType></Schema>
Page 38Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Experiment #2: Precision
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1<->2 1<->3 1<->4 1<->5 2<->3 2<->4 2<->5 3<->4 3<->5 4<->5Match Tasks
Pre
cisi
on
Max noMax COMA Manual
Page 39Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Experiment #2: Recall
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1<->2 1<->3 1<->4 1<->5 2<->3 2<->4 2<->5 3<->4 3<->5 4<->5Match Tasks
Rec
all
Max noMax COMA Manual
Page 40Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Experiment #3: Results (Precision)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1<->2 1<->3 1<->4 1<->5 2<->3 2<->4 2<->5 3<->4 3<->5 4<->5Match Tasks
Pre
cisi
on
Max noMax COMA Manual
Page 41Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Experiment #3: Results (Recall)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1<->2 1<->3 1<->4 1<->5 2<->3 2<->4 2<->5 3<->4 3<->5 4<->5Match Tasks
Re
ca
ll
Max noMax COMA Manual