Date post: | 02-Jan-2016 |
Category: |
Documents |
Upload: | zachary-calhoun |
View: | 30 times |
Download: | 2 times |
LLogics for DData and KKnowledgeRRepresentation
Application of (Ground) ClassL
Outline Ontologies
Lightweight Ontologies
Classifications
Optimization of Classifications
Document Classification in LOs
Query-answering in LOs
Semantic Matching
2
Ontology Ontologies are explicit
specifications of conceptualizations.
They are often thought of as directed graphs whose nodes represent concepts and whose edges represent relations between concepts.
3
Animal
Bird HeadMammal
Predator Herbivore
GoatTiger
Chicken
Cat
Is-a
Is-a
Is-aIs-a
Is-a
EatsEats
Is-aPart-of
Is-a Is-a
Eats
Body
Part-of
Concept The notion of concept is understood as defined in
Knowledge Representation, i.e., as a set of objects or individuals.
This set is called the concept extension or the concept interpretation.
Concepts are often lexically defined, i.e. they have natural language names which are used to describe the concept extensions.
4
Relation The notion of relation is understood as a set of
ordered pairs, with the two items of the pair from the source concept and the target concept respectively.
The backbone structure of the ontology graph is a taxonomy in which the relations are ‘is-a’, ‘part-of’ and ‘instance-of’ whereas the remaining structure of the graph supplies auxiliary information about the modeled domain and may include relations like ‘located-in’, ‘sibling-of’, ‘ant’, etc.
5
Ontology as a graph
A mathematical definition comes from ‘graph’, an ontology is an ordered pair
O=<V, E>
in which V is the set of vertices describing the concepts and E is the set of edges describing relations.
6
Tree-like Ontologies Take the ontology in
previous slide, remove those auxiliary relations…
… we get a tree-like ontology consisting of the backbone structure with ‘is-a’, ‘part-of’ and even ‘instance-of’ relations.
They are informal Lightweight Ontologies.
7
Animal
Bird HeadMammal
Predator Herbivore
GoatTiger
Chicken
Cat
Is-a
Is-a
Is-aIs-a
Is-a
EatsEats
Is-aPart-of
Is-a Is-a
Eats
Body
Part-of
Descriptive VS. Classification Ontologies Some ontologies are used to describe a piece of world,
such as the Gene ontology, Industry ontology, etc. The purpose it to make a clear description of the world. This is usually the first idea to mind when people talk about ontologies.
Some other ontologies are used to classify things, such as books, documents, web pages, etc. The aim is to provide a domain specific category to organize individuals accordingly. Such ontologies usually take the form of classifications with or without explicit meaningful links.
We will see the difference further, in the transformation into formal Lightweight Ontologies.
8
Why ‘Lightweight’ Ontologies?
Two observations:
1. Majority of existing ontologies are ‘simple’ taxonomies or classifications, i.e., categories to classify resources.
2. Ontologies with arbitrary relations do exist, but no intuitively reasoning techniques support such ontologies in general.
… so we need ‘lightweight’ ontologies.9
Outline Ontologies
Lightweight Ontologies
Classifications
Optimization of Classifications
Document Classification in LOs
Query-answering in LOs
Semantic Matching
10
Lightweight Ontologies A (formal) lightweight ontology is a triple
O = <N,E,C> where
N is a finite set of nodes, E is a set of edges on N, such that <N,E> is a rooted
tree, and C is a finite set of concepts expressed in a formal
language F, such that for any node ni N, there is one ∈and only one concept ci C, and, if n∈ i is the parent node for nj ,then cj c⊑ i.
11
From Tree-like Ontologies to LOs
12
Animal
Bird HeadMammal
Predator Herbivore
GoatTiger
Chicken
Cat
Is-a
Is-a
Is-aIs-a
Is-a Is-aPart-of
Is-a Is-a
Body
Part-of
Head Body
Part-ofPart-
of
Animal
Bird Mammal
Predator Herbivore
GoatTiger
Chicken
Cat
⊑ ⊑
⊑ ⊑ ⊑
⊑ ⊑ ⊑
In Classification Semantics…
13
Animal
Bird HeadMammal
Predator Herbivore
GoatTiger
Chicken
Cat
Is-a
Is-a
Is-aIs-a
Is-a Is-aPart-of
Is-a Is-a
Body
Part-of
Head Body
Part-ofPart-
of
Animal
Bird Mammal
Predator Herbivore
GoatTiger
Chicken
Cat
⊑ ⊑
⊑ ⊑ ⊑
⊑ ⊑ ⊑
⊑⊑
From Tree-like Ontologies to LOs cont. For a descriptive tree-like ontology, the backbone
taxonomy of ‘is-a’ intuitively coincident with ‘subsumption’ relation in LOs. But ‘part-of’ relations has to be modeled as a new kind of binary relation in order to preserve the semantics.
For a classification ontology, the semantics behind the labels of the nodes are the extension interpretation, i.e. the documents (books, websites, etc.) that should be classified under the nodes. Therefore, ‘part-of’ relation also follows the intuition of ‘subsumption’ and can be transformed directly into ‘⊑’ in the target LOs.
14
Populated (Lightweight) Ontologies In Information Retrieval, the term classification is
seen as the process of arranging a set of objects (e.g., documents) into categories or classes.
A classification Ontology is said populated if a set of objects have been classified under ‘proper’ nodes.
Thus a populated (Lightweight) Ontology consists a new type of links: instance-of.
15
Example of a Populated Ontology
16
⊑ ⊑
Head Body
Animal
Bird Mammal
Predator Herbivore
GoatTiger
Chicken
Cat
⊑ ⊑
⊑ ⊑ ⊑
⊑⊑ ⊑
‘Chicken Soup’
‘How to Raise Chicken’
‘Tom and Jerry’ ‘www.protectTiger.org’ …
Instance-of
Instance-of
Instance-of Instance-of Instance-of
Lightweight Ontologies in ClassL:TBox Subsumption terminologies:
‘… C is a finite set of concepts expressed in a formal language F, such that for any node ni N, there is one ∈and only one concept ci C, and, if n∈ i is the parent node for nj ,then cj c⊑ i.’
1. Bird⊑ Animal
2. Mammal⊑ Animal
3. Chicken⊑ Bird
4. Cat⊑ Predator
5. … 17
Observation: a tree-like ontology can be transformed into a lightweight
ontology, but not vise versa.
Populated LOs in ClassL: TBox+ABox Subsumption terminologies:
‘… cj c⊑ i.’ ‘Instance of’ links: ‘concept assertion!’
1. …
2. …
3. …
4. …
5. Chicken(ChickenSoup)
6. Cat(TomAndJerry)
7. …
18
Outline Ontologies
Lightweight Ontologies
Classifications
Optimization of Classifications
Document Classification in LOs
Query-answering in LOs
Semantic Matching
19
Classifications… Classifications hierarchies are easy to use...
... for humans. Classifications hierarchies are pervasive (Google,
Yahoo, Amazon, our PC directories, email folders, address book, etc.).
Classifications hierarchies are largely used in industry (Google, Yahoo, eBay, Amazon, BBC, CNN, libraries, etc.).
Classification hierarchies have been studied for very long (e.g., Dewey Decimal Classification system -- DCC, Library of Congress Classification system –LCC, etc.).
20
Classification Example: Yahoo! Directory
21
Classification Example: Email Folders
22
Classification Example: E-Commerce Category
23
Classifications .. more Classifications hierarchies are lightweight (no
roles, trees or simple DAGs, …). Classification hierarchies are a kind of concept
hierarchies. Labels are natural language sentences; useful but
hard to deal with in an automated way. Links are of the kind “child-of” (e.g. “economy
child-of Europe”), where in an ontology you would have, (instance-of}, or roles, or {is-a} links.
No clear semantics for both labels at nodes and links.
24
How to use such informal
information?
Recall: Lightweight Ontologies A (formal) lightweight ontology is a
triple
O = <N,E,C>, where
N is a finite set of nodes, E is a set of edges on N, such that <N,E> is
a rooted tree, and C is a finite set of concepts expressed in
a formal language F, such that for any node ni N, there is one and only one concept ∈ci C, and, if n∈ i is the parent node for nj ,then cj c⊑ i.
25
A classification already has.
To be fixed
What do LOs Bring? We know that a lightweight ontology is a formal
conceptualization of a domain in terms of concepts and {is-a, instance-of} relationships.
Lightweight ontologies (LOs) add a formal semantics and {instance-of} relationships to classification hierarchies.
In short: LOs make classifications formal!
26
LOs and Ground Class Logic Ground ClassL provides a formal language (syntax
+ semantics) to model lightweight ontologies, where:
concepts are modeled by propositions and formulas;
‘is-a’ relationship is modeled by subsumption ( ) ⊑
and ‘is-instance-of’ relationship is modeled by individual assertion (i.e., wffs like P(a)).
27
Label Semantics Natural language words are
often ambiguous.
E.g. Java (an island, a beverage, an OO programming language)
When used with other words in a label, improper senses can be pruned.
E.g., “Java Language” – only the 3rd sense of Java is preserved.
28
Level
4
Subjects
Computers andInternet
0
1
2
3
…
…
…
…
…
…
…
(1)
(3)
(5)
(7)
(8)
Programming
Java Language
Java Beans
From NL Labels to Labels in Class Logic Several approaches to rewrite a natural language label
into a ClassL proposition. Following (Giunchiglia et al., 2007), we may
distinguish four steps:
1. Tokenization (get distinct words);Italian Pictures ‘Italian’, ‘Pictures’
2. Words stemming (get to a basic form);Pictures picture
3. Rewrite each word into its proposition;picture picture-noun-1⊓picture-noun-2⊓…⊓picture-verb-2
4. Prune inconsistent senses.picture-noun-1⊓picture-noun-2⊓…⊓picture-verb-2pictureN1
29
Class Logic Label Eamples E.g.1: “Java” becomes the proposition
Java#1 ⊔ Java#2 ⊔ Java#3
where Java#i is a propositional variable representing the ith-sense of the word “Java” according to a dictionary (e.g., WordNet).
E.g.2: “Java Beans” becomes:
(Java#1 ⊔ Java#2 ⊔ Java#3)⊓(Bean#1 ⊔ Bean#2)
30
Advantages of Propositions NL labels are ambiguous, propositions are NOT!
Extensional semantics of propositions naturally maps nodes to real world objects.
Labels as propositions allow us to deal with the standard problems in classification (e.g., document classification, query-answering, and matching) by means of ClassL’s reasoning, mainly the SAT problem.
31
Formalizing the Meaning of Links (1) Child nodes in a classification are always
considered in the context of their parent nodes.
Child nodes therefore specialize the meaning of the parent nodes.
Contextuality property of classifications.
32
Formalizing the Meaning of Links (2) General intersection
relationship(a): can be used to represent facets. The meaning of node 2 is C = A ⊓ B.
Subsumption relationship (b): child nodes are specific case of the parent nodes. The meaning of node 2 is B.
33
1
2
A
B
A
B
A
B? C
(a)
(b)
General Intersection Example
hardwaresoftwarenetworking …
l3 = “Computers
and Internet” l5 = “Programming”
scheduling, planning
computerprogrammin
g
l1 = “Subjects”
34
Concept at a Node Parental contextuality is formalized in ClassL by the
notion of “concept at a node.”
A concept Cr at the root node r is the class proposition (label) used to denote the node.
A concept Ci at a node ni is the conjunction of a proposition Pi (label of ni) and the concept Cj at node nj parent to ni (if it has any parents).
In ClassL: Pi ⊓ Cj.
35
Concept at a Node A concept at a node ni can be computed as the
conjunction of all the labels from the root of the classification hierarchy to ni.
Concepts at nodes capture the classification semantics by using the meaning of labels (propositions defined by using WordNet and a linguistic analysis) and the nodes' position.
36
Concept at a Node: Example
Wine and Cheese
Italy
Europe
Austria
Pictures
1
2 3
4 5
In ClassL: C4 = Ceurope ⊓ Cpictures ⊓ Citaly
37
What have we done? Calculate the concepts and label and concept at
nodes.
In which format?
ClassL
Java#1 ⊔ Java#2 ⊔ Java#3
Ceurope ⊓ Cpictures ⊓ Citaly
…
We have built the ClassL formulas for each node!
38
Distinctions Among Ontology, LO and CLS
39
A
B C
D E
Is-a
Instance-of
Part-of
Is-a
Locate-in
Likes
Ontology
A
B C
D E
Is-a
Instance-of
Part-of
Is-a
Tree-like Ontolog
y
A
B C
D E
Child-ofChild-of
Child-of Child-of
Classification
Most commonformat
BackboneTaxonomy
A
A⊓B A C⊓
A⊓B⊓D A⊓B⊓E
⊑
⊑
⊑
⊑
Classification Semantics
Formalization
Formal Lightweig
ht Ontology
Descriptive Ontologies
Classification Ontologies
Outline Ontologies
Lightweight Ontologies
Classifications
Optimization of Classifications
Document Classification in LOs
Query-answering in LOs
Semantic Matching
40
Rational LOsLOs may be not perfect…Reconstruct a LO based on
the “most specific subsumer” relation.
Nodes get parents which most specifically describe them, still being more general.
The new structure is called, a Rational LO (RLO)
NOTE: classification semantics do not change.
41
EU
Schengen StatesItaly
FranceGermanyPictures
EU
Schengen States
Italy FranceGermany
Pictures
Optimization of Classifications Problem: to find ‘the most specific subsumer’ of a
given node.
Suppose we have, for all nodes in the LO, the concepts at label in ClassL, i.e. wff’s after NLP.
Then we can refer to the ‘subsumption’ reasoning service which finds the minimal with respect to the ordering ‘⊑’.
E.g.: Italy⊑EU, ShengenState⊑EU, Italy⊑ShengenState…
42
Outline Ontologies
Lightweight Ontologies
Classifications
Optimization of Classifications
Document Classification in LOs
Query-answering in LOs
Semantic Matching
43
Document Classification Each document d in a classification is assigned a
proposition Cd in ClassL.
Cd is called document concept.
Cd is build from d in two steps:1. keywords are retrieved from d by using standard
text mining techniques.2. keywords are converted into propositions by using
methodology discussed above.
44
“Get specific” Rule
For any given document d and its concept Cd we classify d in each node ni such that:
1. ⊨ Cd ⊑ Ci (i.e. the concept at node ni is more general than Cd);
2. and there is no node nj (j ≠ i), whose concept at node Cj is more specific than Ci and more general than Cd:
⊨Cj C⊑ i and ⊨ Cd⊑ Cj.
45
Su
bsu
mp
tion
re
ason
ing
Of C
lassL
ExampleSuppose we need to
classify “Professional Java, JDK-5th Edition” by W. Clay Richardson et al.
The document concept of such document d is:Cd = Java#3⊓Programming#2.
The node 7 is the only node which conforms to the “get specific” rule.
Level
0
1
2
3
4
…
…
…
…
…
…
…
(1)
(2) (3)
(4) (5)
(7)(6)
(8)
Subjects
Computers andInternet
Business andInvesting
Small Business and
Entrepreneurship
Programming
New BusinessEnterprises
Java Language
Java Beans
46
Example (cont’)Suppose we need to
classify “Visual Basic.Net Programming for Business” by Philip A. Koneman.
The document concept of such document d is:Cd = VisualBasicNet#1⊓Programming#2⊓Business#1
The nodes 2,5 conform to the “get specific” rule.
Level
0
1
2
3
4
…
…
…
…
…
…
…
(1)
(2) (3)
(4) (5)
(7)(6)
(8)
Subjects
Computers andInternet
Business andInvesting
Small Business and
Entrepreneurship
Programming
New BusinessEnterprises
Java Language
Java Beans
47
What have we done by far? Classify documents.
How? Get specific algorithm! But how to implement the algorithm?
ClassL!
We are reasoning with the ‘Concept Realization’ service of ClassL!
(With an empty ABox.)
⊨Cj C⊑ i and ⊨ Cd⊑ Cj
48
Outline Ontologies
Lightweight Ontologies
Classifications
Optimization of Classifications
Document Classification in LOs
Query-answering in LOs
Semantic Matching
49
Intuitive Query-answering Query-answering on a hierarchy of documents
based on a query q as a set of keywords is defined in two steps:
1. The ClassL proposition Cq is build from q by converting q’s keywords as said above.
2. The set of answers (retrieval set) to q is defined as a set of subsumption checking problems in Ground ClassL:
Aq ={d∈ document | T⊨ Cd C⊑ q}.
50
Query-Answering: A Problem Searching on all the documents may be expensive
(millions of documents classified).
We define a set of nodes which contain only answers to a query q as follows:
Nsq ={ni node| T⊨Ci ⊑ Cq}
NOTE: Each document d in ni in Nsq is an answer to
the query q, because T⊨ Cd C⊑ i by definition of classification.
Thus all the documents d in Nsq ∈ Aq.
51
Query-Answering: Classification Set We extend Ns
q (named sound classification answer) by adding a set of nodes (named query classification set) defined as:
Clq ={ni node | d n∈ i and ⊨Cd ≡ Cq}
i.e., the nodes which constitute the classification set of a document d, whose concept Cd is equivalent to Cq.
52
Query-Answering: Sound Answer Set The set of answers (retrieval set) to q is finally
defined as the following set:
Asq =df {d n∈ i | ni ∈ Ns
q} ∪ {d n∈ i | ni C∈ lq and ⊨Cd C⊑ q}.
Under this definition, an answer to a query are documents from nodes whose concepts are more specific than the query concept.
53
Example Suppose that a user makes
a query q, which is converted into Cq = Java#3⊓Cobol#1, where Cobol#1 is “common business-oriented language.”
It can be shown that Ns
q = {7,8}.
Exercise: show it.
Level
0
1
2
3
4
…
…
…
…
(1)
(3)
(5)
(7)
(8)
Subjects
Computers andInternet
Programming
Java Language
Java Beans
Nsq
Nsq
54
Example (cont’) It can be shown that
Nsq = {7,8}.
“Java for COBOL programmers, 2nd ed.” is classified in node 2, so it is not an aswer by using only Ns
q = {7,8}.
We then consider Clq to compute more answers, among others are the documents in node 5.
Level
0
1
2
3
4
…
…
…
…
(1)
(3)
(5)
(7)
(8)
Subjects
Computers andInternet
Programming
Java Language
Java Beans
Clq
55
Sound Answer Set: Remark The set As
q is sound (i.e., contains answers to q), but not complete (i.e., does not contain all the answers to q).
See the next example.
d
EuropeanUnion
Pictures
(1)
(2)
56
Sound Answer Set Example Suppose that a user makes
a query q like “video or pictures of Italy,” which is converted into Cq = Italy#1 ⊓(Video#2 Pictures#1).⊔
Cq is equivalent to:
Cq1 = Video#2 Italy#1,⊓
Cq2 = Pictures#1 Italy#1.⊓
d
EuropeanUnion
Pictures
(1)
(2)
57
Sound Answer Set: Example (cont’) But not |= C2 ⊑ C1
q,
hence a document d in 2 about Rome, with Cd = Pictures#1 Rome#1 ⊓
is not retrieved, since:
Nsq = {ni |= Ci ⊑ Cq} = ∅ and
Clq ={1}, so d ∉ Asq.
(Asq is not complete)
d
EuropeanUnion
Pictures
(1)
(2)
58
Some Comments The edge structure of a LO is not considered for
document classification, neither for query answering.
The edges information becomes redundant, as it is implicitly encoded in the “concept at a node” notion.
There are more than one way to build a LO from a set of concepts at nodes.
59
What have we done in Query Answering? Find the set of documents.
How? Find the concept that is subsumed by the query.
But how to implement it?
ClassL!
We are reasoning with the ‘Concept subsumption’ service of ClassL!
⊨Cd ⊑ Cq
60
Outline Ontologies
Lightweight Ontologies
Classifications
Optimization of Classifications
Document Classification in LOs
Query-answering in LOs
Semantic Matching
61
Date: a matching?
62
Why Matching? Most popular knowledge can be represented as
graphs. The heterogeneity between knowledge graphs demands the exposition of relations, such as semantically equivalent.
Some popular situations that can be modeled as a matching problem are:
Concept matching in semantic networks.
Schema matching in distributed databases.
Ontology matching (ontology “alignment”) in the Semantic Web.
63
A Matching Problem (Example)
64
?
?
?
Relational DB Schemas Let us consider the following relational database
(RDB) model, say “BANK”:
65
(Giunchiglia & Shvaiko, 2007)
Relational DB Schemas: Representation 1 We can represent the RDB model “BANK” as a graph
(a tree) with root “BANK”:
The RDB model is first partitioned into relations, then attributes and data instances.
66
(Giunchiglia & Shvaiko, 2007)
Relational DB Schemas: Representation 2 We can represent the RDB model “BANK” as a graph
(a tree) with root “BANK”:
The model is partitioned into relations, then into tuples, attributes and data instances.
67
(Giunchiglia & Shvaiko, 2007)
Relational DB Schemas: NOTEs Which of the two representations is more preferable
depends on the concrete task?
It is always possible to transform one representation into the other.
In contrast to the example of RDB “BANK”, DB schemas are seldom trees. More often, DB schemas are translated into Directed Acyclic Graphs (DAG’s).
68
OODB Schemas Let us consider the RDB “BANK” in terms of an
object-oriented DB (OODB) schema:BRANCH (Street, City, Zip) PERSON (F_Name, L_Name) STAFF : PERSON (Position, Salary, Manager)
The resulting graph is:
69
(Giunchiglia & Shvaiko, 2007)
OODB Schemas: NOTEs
OODB schemas capture more semantics than the relational DBs.
In particular, an OODB schema: explicitly expresses subsumption relations between
elements; admits special types of arcs for part/whole relationships
in terms of aggregation and composition.
70
Semi-structured Data Neither RDBs nor OODBs capture all the features of
semi-structured or unstructured data (Buneman, 1997): semi-structured data do not possess a regular structure
(schemaless); the “structure” of semi-structured data could be partial or
even implicit.
Typical examples are: HTML and XML.
71
XML Schemas XML schemas can be represented as DAGs. The graph from the RDB “BANK” could also be
obtained from an XML schema.
72
(Giunchiglia & Shvaiko, 2007)
XML Schemas: NOTEs Often XML schemas represent hierarchical data
models. In this case the only relationships between the elements
are {is-a}.
Attributes in XML are used to represent extra information about data. There are no strict rules telling us when data should be represented as elements, or as attributes.
73
Concept Hierarchies Def. A concept hierarchy is a semi-formal
conceptualization of an application domain in terms of concepts and relationships.
74
(Giunchiglia & Shvaiko, 2007)
Concept Hierarchies: NOTEs Examples are classification hierarchies, e.g.,
and directories (catalogs). Classification hierarchies / Web directories are
sometimes referred to as lightweight ontologies (Uschold & Gruninger, 2004). However: They are not ontologies, as they lack of a formal
semantics (semi-formal vs formal.) They don’t formalize class instances.
75
The Matching Problems A Matching Problem (syntactic or semantic) is a
problem on graphs summarized as:
Given two finite graphs, is there a matching between the (nodes of the) two graphs?
In other words: given two graph-like structures (e.g., concept hierarchies or ontologies), produce a mapping between the nodes of the graphs that semantically correspond to each other.
76
Matching Procedures A problem of matching can be decomposed in two
steps:
1. Extract the graphs from the conceptual models under consideration;
2. Match the resulting graphs.
Below we show some examples of step 1. (We follow [Giunchiglia & Shvaiko, 2007].)
77
Syntactic VS. Semantic
78
Matching
Semantic Matching
Syntactic Matching
Relations are computed between labels at nodes
R = {x[0,1]}
Relations are computed between concepts at nodes
R = { =, ⊑, ⊒, , ⊓} Note: all
previous systems are syntactic…
Note: needed for proper labeling of context mappings
Semantic Matching Mapping element is a 4-tuple < IDij, n1i, n2j, R >, where
IDij is a unique identifier of the given mapping element;
n1i is the i-th node of the first graph;
n2j is the j-th node of the second graph;
R specifies a semantic relation between the concepts at the given nodes
Semantic Matching: Given two graphs G1 and G2, for any node n1i G1, find the strongest semantic relation R holding with node n2j G2.
Computed R’s, listed in the decreasing binding strength order: equivalence { = }; more general/ specific {⊒,⊑}; mismatch { }; overlapping {⊓}.
79
Familiar?
Example
?
< ID22, 2, 2, = >
=
?
?
< ID22, 2, 2, = >
4
Images
Europe
ItalyAustria
2
3 4
1
Italy
Europe
Wine and Cheese
Austria
Pictures
1
2 3
5
< ID21, 2, 1, ⊑ >
⊑
< ID24, 2, 4, ⊒ >
⊒
80
S-Match AlgorithmFour Macro StepsGiven two labeled trees T1 and T2, do:1. For all labels in T1 and T2 compute concepts at labels 2. For all nodes in T1 and T2 compute concepts at nodes3. For all pairs of labels in T1 and T2 compute relations between
concepts at labels4. For all pairs of nodes in T1 and T2 compute relations between
concepts at nodes
Steps 1 and 2 constitute the preprocessing phase, and are executed once and each time after the schema/ontology is changed (OFF- LINE part). This is the SAME as the procedure of formalizing a classification into a lightweight ontology as dicussed in last lecture.
Steps 3 and 4 constitute the matching phase, and are executed every time the two schemas/ontologies are to be matched (ON - LINE part)
81
Step 1: Compute concepts at labels The idea:
Translate natural language expressions into internal formal language Compute concepts based on possible senses of words in a label and
their interrelations Preprocessing:
Tokenization. Labels (according to punctuation, spaces, etc.) are parsed into tokens. E.g., Wine and Cheese <Wine, and, Cheese>;
Lemmatization. Tokens are morphologically analyzed in order to find all their possible basic forms. E.g., Images Image;
Building atomic concepts. An oracle (WordNet) is used to extract senses of lemmatized tokens. E.g., Image has 8 senses, 7 as a noun and 1 as a verb;
Building complex concepts. Prepositions, conjunctions, etc. are translated into logical connectives and used to build complex concepts out of the atomic concepts
E.g., CWine and Cheese = <Wine, U(WNWine)> ⊔ <Cheese, U(WNCheese)>,
where ⋃ is a union of the senses that WordNet attaches to lemmatized tokens
82
Step 2: Compute concepts at nodes The idea: extend concepts at labels by capturing the knowledge
residing in a structure of a graph in order to define a context in which the given concept at a label occurs
Computation (basic case): Concept at a node for some node n is computed as an intersection of concepts at labels located above the given node, including the node itself
Wine and Cheese
Italy
Europe
Austria
Pictures
1
2 3
4 5
C4 = Ceurope ⊓CPictures⊓CItaly
83
Step 3: compute relations between concepts at labels
The idea: Exploit a priori knowledge, e.g., lexical, domain knowledge with the help of element level semantic matchers
Results of step 3:
Italy
Europe
Wine and
Cheese
Austria
Pictures
1
2 3
4 5
Europe
Italy Austria
2
3 4
1
Images
T1 T2
T2
=CItaly
=CAustria
=CEurope
=CImages
CAustriaCItalyCPicturesCEuropeT1
CWine CCheese
84
Don’t hurry to Step 4! What we have done in Step 3?
Find semantic relations. What for?
To build the semantic relation bases for further matching.
What is a ‘relation base’ in the logic sense?
TBox!We are building the TBox!
85
Step 4: compute relations between concepts at nodesThe idea: Reduce the matching problem to a
validity problem
Context: the relations between concepts at labels
Wffrel (C1i, C2j): relation to be proved (with: C1i in Tree1 and C2j in Tree2)
Translate into propositional logic: C1i = C2j is translated into C1i C2j C1i subsumes C2j is translated into C1i C2j
C1i C2j is translated into ¬ (C1i C2j) Prove Context Wffrel (C1i, C2j) is valid with
Rel={=,⊑,⊒,⊥,⊓} A propositional formula is valid iff its negation is
unsatisfiable (SAT deciders are sound and complete…)86
Pseudo code of Step4 1. i, j, N1, N2: int; 2. context, goal: wff; 3. n1, n2: node; 4. T1, T2: tree of (node); 5. relation = {=, ⊑ , ⊒ , }; 6. ClabMatrix(N1, N2), CnodMatrix(N1, N2), relation: relation
7. function mkCnodMatrix(T1, T2, ClabMatrix) { 8. for (i = 0; i < N1; i++) do 9. for (j = 0; j < N2; j++) do 10. CnodMatrix(i, j):=NodeMatch(T1(i),T2(j), ClabMatrix)}
11. function NodeMatch(n1, n2, ClabMatrix) { 12. context:=mkcontext(n1, n2, ClabMatrix, context); 13. foreach (relation in < =, ⊑ , ⊒ , >) do { 14. goal:= w2r(mkwff(relation, GetCnod(n1), GetCnod(n2)); 15. if VALID(mkwff(, context, goal)) 16. return relation;} 17. return IDK;}
Validity Reasoning Problem:
Context⊨goal?
87
Example Example. Suppose we want to check if C12 = C22
88
T2
=C14
C13
=C12
C11
C25C24C23C22C21T1
(C1Images C2Pictures) (C1Europe C2Europe) (C12 C22 )
Context Goal
Examples
Italy
Europe
Wine and
Cheese
Austria
Pictures
1
2 3
4 5
Europe
Italy Austria
2
3 4
1Images
T1 T2=
Italy
Europe
Wine and
Cheese
Austria
Pictures
1
2 3
4 5
Europe
Italy Austria
2
3 4
1Images
T1 T2
Italy
Europe
Wine and
Cheese
Austria
Pictures
1
2 3
4 5
Europe
Italy Austria
2
3 4
1Images
T1 T2
Italy
Europe
Wine and
Cheese
Austria
Pictures
1
2 3
4 5
Europe
Italy Austria
2
3 4
1Images
T1 T2
89
Short Summary What is done in Step 4? Find implicit semantic relations.
How? Validity reasoning based on TBox built in Step 3!
What is working on the backend of semantic matching?
ClassL reasoning!
90
A Real S-Match System Structure
91
Testing Methodology
92
Measuring match quality Expert mappings are inherently subjective Two degrees of freedom
Directionality
Use of Oracles
Indicators Precision, [0,1] (correctness criterion, how many false positives?)
Recall, [0,1] (completeness criterion, how many false negatives?)
Overall, [-1,1]
F-measure, [0,1]
Time, sec.
Matching systems: S-Match vs. Cupid, COMA and SF as implemented in Rondo
Preliminary Experimental Results
Average Results
0.0
0.2
0.4
0.6
0.8
1.0
Rondo Cupid COMA S-match
0.02.0
4.06.0
8.010.0
12.0sec
Precision Recall Overall F-measure Time
93
Three experiments, test cases from different domains
Some characteristics of test cases: #nodes 4-39, depth 2-3
PC: PIV 1,7Ghz; 256Mb. RAM; Win XP
References & Credits References:
F. Giunchiglia, P. Shvaiko, “Semantic matching.” Knowledge Engineering Review, 18(3):265-280, 2003.
F. Giunchiglia, M. Marchese, I. Zaihrayeu. “Encoding Classifications into Lightweight Ontologies.” J. of Data Semantics VIII, Springer-Verlag LNCS 4380, pp 57-81, 2007.
F.Giunchiglia, I.Zaihrayeu. “Lightweight Ontologies” Encyclopedia of Database Systems , Springer-Verlag, 2008.
Available as a DIT Technical Report here.
94