+ All Categories
Home > Documents > L ogics for D ata and K nowledge R epresentation

L ogics for D ata and K nowledge R epresentation

Date post: 02-Jan-2016
Category:
Upload: zachary-calhoun
View: 30 times
Download: 2 times
Share this document with a friend
Description:
L ogics for D ata and K nowledge R epresentation. Application of (Ground) ClassL. Outline. Ontologies Lightweight Ontologies Classifications Optimization of Classifications Document Classification in LOs Query-answering in LOs Semantic Matching. Ontology. Animal. - PowerPoint PPT Presentation
94
L Logics for D Data and K Knowledge R Representation Application of (Ground) ClassL
Transcript
Page 1: L ogics  for  D ata  and  K nowledge R epresentation

LLogics for DData and KKnowledgeRRepresentation

Application of (Ground) ClassL

Page 2: L ogics  for  D ata  and  K nowledge R epresentation

Outline Ontologies

Lightweight Ontologies

Classifications

Optimization of Classifications

Document Classification in LOs

Query-answering in LOs

Semantic Matching

2

Page 3: L ogics  for  D ata  and  K nowledge R epresentation

Ontology Ontologies are explicit

specifications of conceptualizations.

They are often thought of as directed graphs whose nodes represent concepts and whose edges represent relations between concepts.

3

Animal

Bird HeadMammal

Predator Herbivore

GoatTiger

Chicken

Cat

Is-a

Is-a

Is-aIs-a

Is-a

EatsEats

Is-aPart-of

Is-a Is-a

Eats

Body

Part-of

Page 4: L ogics  for  D ata  and  K nowledge R epresentation

Concept The notion of concept is understood as defined in

Knowledge Representation, i.e., as a set of objects or individuals.

This set is called the concept extension or the concept interpretation.

Concepts are often lexically defined, i.e. they have natural language names which are used to describe the concept extensions.

4

Page 5: L ogics  for  D ata  and  K nowledge R epresentation

Relation The notion of relation is understood as a set of

ordered pairs, with the two items of the pair from the source concept and the target concept respectively.

The backbone structure of the ontology graph is a taxonomy in which the relations are ‘is-a’, ‘part-of’ and ‘instance-of’ whereas the remaining structure of the graph supplies auxiliary information about the modeled domain and may include relations like ‘located-in’, ‘sibling-of’, ‘ant’, etc.

5

Page 6: L ogics  for  D ata  and  K nowledge R epresentation

Ontology as a graph

A mathematical definition comes from ‘graph’, an ontology is an ordered pair

O=<V, E>

in which V is the set of vertices describing the concepts and E is the set of edges describing relations.

6

Page 7: L ogics  for  D ata  and  K nowledge R epresentation

Tree-like Ontologies Take the ontology in

previous slide, remove those auxiliary relations…

… we get a tree-like ontology consisting of the backbone structure with ‘is-a’, ‘part-of’ and even ‘instance-of’ relations.

They are informal Lightweight Ontologies.

7

Animal

Bird HeadMammal

Predator Herbivore

GoatTiger

Chicken

Cat

Is-a

Is-a

Is-aIs-a

Is-a

EatsEats

Is-aPart-of

Is-a Is-a

Eats

Body

Part-of

Page 8: L ogics  for  D ata  and  K nowledge R epresentation

Descriptive VS. Classification Ontologies Some ontologies are used to describe a piece of world,

such as the Gene ontology, Industry ontology, etc. The purpose it to make a clear description of the world. This is usually the first idea to mind when people talk about ontologies.

Some other ontologies are used to classify things, such as books, documents, web pages, etc. The aim is to provide a domain specific category to organize individuals accordingly. Such ontologies usually take the form of classifications with or without explicit meaningful links.

We will see the difference further, in the transformation into formal Lightweight Ontologies.

8

Page 9: L ogics  for  D ata  and  K nowledge R epresentation

Why ‘Lightweight’ Ontologies?

Two observations:

1. Majority of existing ontologies are ‘simple’ taxonomies or classifications, i.e., categories to classify resources.

2. Ontologies with arbitrary relations do exist, but no intuitively reasoning techniques support such ontologies in general.

… so we need ‘lightweight’ ontologies.9

Page 10: L ogics  for  D ata  and  K nowledge R epresentation

Outline Ontologies

Lightweight Ontologies

Classifications

Optimization of Classifications

Document Classification in LOs

Query-answering in LOs

Semantic Matching

10

Page 11: L ogics  for  D ata  and  K nowledge R epresentation

Lightweight Ontologies A (formal) lightweight ontology is a triple

O = <N,E,C> where

N is a finite set of nodes, E is a set of edges on N, such that <N,E> is a rooted

tree, and C is a finite set of concepts expressed in a formal

language F, such that for any node ni N, there is one ∈and only one concept ci C, and, if n∈ i is the parent node for nj ,then cj c⊑ i.

11

Page 12: L ogics  for  D ata  and  K nowledge R epresentation

From Tree-like Ontologies to LOs

12

Animal

Bird HeadMammal

Predator Herbivore

GoatTiger

Chicken

Cat

Is-a

Is-a

Is-aIs-a

Is-a Is-aPart-of

Is-a Is-a

Body

Part-of

Head Body

Part-ofPart-

of

Animal

Bird Mammal

Predator Herbivore

GoatTiger

Chicken

Cat

⊑ ⊑

⊑ ⊑ ⊑

⊑ ⊑ ⊑

Page 13: L ogics  for  D ata  and  K nowledge R epresentation

In Classification Semantics…

13

Animal

Bird HeadMammal

Predator Herbivore

GoatTiger

Chicken

Cat

Is-a

Is-a

Is-aIs-a

Is-a Is-aPart-of

Is-a Is-a

Body

Part-of

Head Body

Part-ofPart-

of

Animal

Bird Mammal

Predator Herbivore

GoatTiger

Chicken

Cat

⊑ ⊑

⊑ ⊑ ⊑

⊑ ⊑ ⊑

⊑⊑

Page 14: L ogics  for  D ata  and  K nowledge R epresentation

From Tree-like Ontologies to LOs cont. For a descriptive tree-like ontology, the backbone

taxonomy of ‘is-a’ intuitively coincident with ‘subsumption’ relation in LOs. But ‘part-of’ relations has to be modeled as a new kind of binary relation in order to preserve the semantics.

For a classification ontology, the semantics behind the labels of the nodes are the extension interpretation, i.e. the documents (books, websites, etc.) that should be classified under the nodes. Therefore, ‘part-of’ relation also follows the intuition of ‘subsumption’ and can be transformed directly into ‘⊑’ in the target LOs.

14

Page 15: L ogics  for  D ata  and  K nowledge R epresentation

Populated (Lightweight) Ontologies In Information Retrieval, the term classification is

seen as the process of arranging a set of objects (e.g., documents) into categories or classes.

A classification Ontology is said populated if a set of objects have been classified under ‘proper’ nodes.

Thus a populated (Lightweight) Ontology consists a new type of links: instance-of.

15

Page 16: L ogics  for  D ata  and  K nowledge R epresentation

Example of a Populated Ontology

16

⊑ ⊑

Head Body

Animal

Bird Mammal

Predator Herbivore

GoatTiger

Chicken

Cat

⊑ ⊑

⊑ ⊑ ⊑

⊑⊑ ⊑

‘Chicken Soup’

‘How to Raise Chicken’

‘Tom and Jerry’ ‘www.protectTiger.org’ …

Instance-of

Instance-of

Instance-of Instance-of Instance-of

Page 17: L ogics  for  D ata  and  K nowledge R epresentation

Lightweight Ontologies in ClassL:TBox Subsumption terminologies:

‘… C is a finite set of concepts expressed in a formal language F, such that for any node ni N, there is one ∈and only one concept ci C, and, if n∈ i is the parent node for nj ,then cj c⊑ i.’

1. Bird⊑ Animal

2. Mammal⊑ Animal

3. Chicken⊑ Bird

4. Cat⊑ Predator

5. … 17

Observation: a tree-like ontology can be transformed into a lightweight

ontology, but not vise versa.

Page 18: L ogics  for  D ata  and  K nowledge R epresentation

Populated LOs in ClassL: TBox+ABox Subsumption terminologies:

‘… cj c⊑ i.’ ‘Instance of’ links: ‘concept assertion!’

1. …

2. …

3. …

4. …

5. Chicken(ChickenSoup)

6. Cat(TomAndJerry)

7. …

18

Page 19: L ogics  for  D ata  and  K nowledge R epresentation

Outline Ontologies

Lightweight Ontologies

Classifications

Optimization of Classifications

Document Classification in LOs

Query-answering in LOs

Semantic Matching

19

Page 20: L ogics  for  D ata  and  K nowledge R epresentation

Classifications… Classifications hierarchies are easy to use...

... for humans. Classifications hierarchies are pervasive (Google,

Yahoo, Amazon, our PC directories, email folders, address book, etc.).

Classifications hierarchies are largely used in industry (Google, Yahoo, eBay, Amazon, BBC, CNN, libraries, etc.).

Classification hierarchies have been studied for very long (e.g., Dewey Decimal Classification system -- DCC, Library of Congress Classification system –LCC, etc.).

20

Page 21: L ogics  for  D ata  and  K nowledge R epresentation

Classification Example: Yahoo! Directory

21

Page 22: L ogics  for  D ata  and  K nowledge R epresentation

Classification Example: Email Folders

22

Page 23: L ogics  for  D ata  and  K nowledge R epresentation

Classification Example: E-Commerce Category

23

Page 24: L ogics  for  D ata  and  K nowledge R epresentation

Classifications .. more Classifications hierarchies are lightweight (no

roles, trees or simple DAGs, …). Classification hierarchies are a kind of concept

hierarchies. Labels are natural language sentences; useful but

hard to deal with in an automated way. Links are of the kind “child-of” (e.g. “economy

child-of Europe”), where in an ontology you would have, (instance-of}, or roles, or {is-a} links.

No clear semantics for both labels at nodes and links.

24

How to use such informal

information?

Page 25: L ogics  for  D ata  and  K nowledge R epresentation

Recall: Lightweight Ontologies A (formal) lightweight ontology is a

triple

O = <N,E,C>, where

N is a finite set of nodes, E is a set of edges on N, such that <N,E> is

a rooted tree, and C is a finite set of concepts expressed in

a formal language F, such that for any node ni N, there is one and only one concept ∈ci C, and, if n∈ i is the parent node for nj ,then cj c⊑ i.

25

A classification already has.

To be fixed

Page 26: L ogics  for  D ata  and  K nowledge R epresentation

What do LOs Bring? We know that a lightweight ontology is a formal

conceptualization of a domain in terms of concepts and {is-a, instance-of} relationships.

Lightweight ontologies (LOs) add a formal semantics and {instance-of} relationships to classification hierarchies.

In short: LOs make classifications formal!

26

Page 27: L ogics  for  D ata  and  K nowledge R epresentation

LOs and Ground Class Logic Ground ClassL provides a formal language (syntax

+ semantics) to model lightweight ontologies, where:

concepts are modeled by propositions and formulas;

‘is-a’ relationship is modeled by subsumption ( ) ⊑

and ‘is-instance-of’ relationship is modeled by individual assertion (i.e., wffs like P(a)).

27

Page 28: L ogics  for  D ata  and  K nowledge R epresentation

Label Semantics Natural language words are

often ambiguous.

E.g. Java (an island, a beverage, an OO programming language)

When used with other words in a label, improper senses can be pruned.

E.g., “Java Language” – only the 3rd sense of Java is preserved.

28

Level

4

Subjects

Computers andInternet

0

1

2

3

(1)

(3)

(5)

(7)

(8)

Programming

Java Language

Java Beans

Page 29: L ogics  for  D ata  and  K nowledge R epresentation

From NL Labels to Labels in Class Logic Several approaches to rewrite a natural language label

into a ClassL proposition. Following (Giunchiglia et al., 2007), we may

distinguish four steps:

1. Tokenization (get distinct words);Italian Pictures ‘Italian’, ‘Pictures’

2. Words stemming (get to a basic form);Pictures picture

3. Rewrite each word into its proposition;picture picture-noun-1⊓picture-noun-2⊓…⊓picture-verb-2

4. Prune inconsistent senses.picture-noun-1⊓picture-noun-2⊓…⊓picture-verb-2pictureN1

29

Page 30: L ogics  for  D ata  and  K nowledge R epresentation

Class Logic Label Eamples E.g.1: “Java” becomes the proposition

Java#1 ⊔ Java#2 ⊔ Java#3

where Java#i is a propositional variable representing the ith-sense of the word “Java” according to a dictionary (e.g., WordNet).

E.g.2: “Java Beans” becomes:

(Java#1 ⊔ Java#2 ⊔ Java#3)⊓(Bean#1 ⊔ Bean#2)

30

Page 31: L ogics  for  D ata  and  K nowledge R epresentation

Advantages of Propositions NL labels are ambiguous, propositions are NOT!

Extensional semantics of propositions naturally maps nodes to real world objects.

Labels as propositions allow us to deal with the standard problems in classification (e.g., document classification, query-answering, and matching) by means of ClassL’s reasoning, mainly the SAT problem.

31

Page 32: L ogics  for  D ata  and  K nowledge R epresentation

Formalizing the Meaning of Links (1) Child nodes in a classification are always

considered in the context of their parent nodes.

Child nodes therefore specialize the meaning of the parent nodes.

Contextuality property of classifications.

32

Page 33: L ogics  for  D ata  and  K nowledge R epresentation

Formalizing the Meaning of Links (2) General intersection

relationship(a): can be used to represent facets. The meaning of node 2 is C = A ⊓ B.

Subsumption relationship (b): child nodes are specific case of the parent nodes. The meaning of node 2 is B.

33

1

2

A

B

A

B

A

B? C

(a)

(b)

Page 34: L ogics  for  D ata  and  K nowledge R epresentation

General Intersection Example

hardwaresoftwarenetworking …

l3 = “Computers

and Internet” l5 = “Programming”

scheduling, planning

computerprogrammin

g

l1 = “Subjects”

34

Page 35: L ogics  for  D ata  and  K nowledge R epresentation

Concept at a Node Parental contextuality is formalized in ClassL by the

notion of “concept at a node.”

A concept Cr at the root node r is the class proposition (label) used to denote the node.

A concept Ci at a node ni is the conjunction of a proposition Pi (label of ni) and the concept Cj at node nj parent to ni (if it has any parents).

In ClassL: Pi ⊓ Cj.

35

Page 36: L ogics  for  D ata  and  K nowledge R epresentation

Concept at a Node A concept at a node ni can be computed as the

conjunction of all the labels from the root of the classification hierarchy to ni.

Concepts at nodes capture the classification semantics by using the meaning of labels (propositions defined by using WordNet and a linguistic analysis) and the nodes' position.

36

Page 37: L ogics  for  D ata  and  K nowledge R epresentation

Concept at a Node: Example

Wine and Cheese

Italy

Europe

Austria

Pictures

1

2 3

4 5

In ClassL: C4 = Ceurope ⊓ Cpictures ⊓ Citaly

37

Page 38: L ogics  for  D ata  and  K nowledge R epresentation

What have we done? Calculate the concepts and label and concept at

nodes.

In which format?

ClassL

Java#1 ⊔ Java#2 ⊔ Java#3

Ceurope ⊓ Cpictures ⊓ Citaly

We have built the ClassL formulas for each node!

38

Page 39: L ogics  for  D ata  and  K nowledge R epresentation

Distinctions Among Ontology, LO and CLS

39

A

B C

D E

Is-a

Instance-of

Part-of

Is-a

Locate-in

Likes

Ontology

A

B C

D E

Is-a

Instance-of

Part-of

Is-a

Tree-like Ontolog

y

A

B C

D E

Child-ofChild-of

Child-of Child-of

Classification

Most commonformat

BackboneTaxonomy

A

A⊓B A C⊓

A⊓B⊓D A⊓B⊓E

Classification Semantics

Formalization

Formal Lightweig

ht Ontology

Descriptive Ontologies

Classification Ontologies

Page 40: L ogics  for  D ata  and  K nowledge R epresentation

Outline Ontologies

Lightweight Ontologies

Classifications

Optimization of Classifications

Document Classification in LOs

Query-answering in LOs

Semantic Matching

40

Page 41: L ogics  for  D ata  and  K nowledge R epresentation

Rational LOsLOs may be not perfect…Reconstruct a LO based on

the “most specific subsumer” relation.

Nodes get parents which most specifically describe them, still being more general.

The new structure is called, a Rational LO (RLO)

NOTE: classification semantics do not change.

41

EU

Schengen StatesItaly

FranceGermanyPictures

EU

Schengen States

Italy FranceGermany

Pictures

Page 42: L ogics  for  D ata  and  K nowledge R epresentation

Optimization of Classifications Problem: to find ‘the most specific subsumer’ of a

given node.

Suppose we have, for all nodes in the LO, the concepts at label in ClassL, i.e. wff’s after NLP.

Then we can refer to the ‘subsumption’ reasoning service which finds the minimal with respect to the ordering ‘⊑’.

E.g.: Italy⊑EU, ShengenState⊑EU, Italy⊑ShengenState…

42

Page 43: L ogics  for  D ata  and  K nowledge R epresentation

Outline Ontologies

Lightweight Ontologies

Classifications

Optimization of Classifications

Document Classification in LOs

Query-answering in LOs

Semantic Matching

43

Page 44: L ogics  for  D ata  and  K nowledge R epresentation

Document Classification Each document d in a classification is assigned a

proposition Cd in ClassL.

Cd is called document concept.

Cd is build from d in two steps:1. keywords are retrieved from d by using standard

text mining techniques.2. keywords are converted into propositions by using

methodology discussed above.

44

Page 45: L ogics  for  D ata  and  K nowledge R epresentation

“Get specific” Rule

For any given document d and its concept Cd we classify d in each node ni such that:

1. ⊨ Cd ⊑ Ci (i.e. the concept at node ni is more general than Cd);

2. and there is no node nj (j ≠ i), whose concept at node Cj is more specific than Ci and more general than Cd:

⊨Cj C⊑ i and ⊨ Cd⊑ Cj.

45

Su

bsu

mp

tion

re

ason

ing

Of C

lassL

Page 46: L ogics  for  D ata  and  K nowledge R epresentation

ExampleSuppose we need to

classify “Professional Java, JDK-5th Edition” by W. Clay Richardson et al.

The document concept of such document d is:Cd = Java#3⊓Programming#2.

The node 7 is the only node which conforms to the “get specific” rule.

Level

0

1

2

3

4

(1)

(2) (3)

(4) (5)

(7)(6)

(8)

Subjects

Computers andInternet

Business andInvesting

Small Business and

Entrepreneurship

Programming

New BusinessEnterprises

Java Language

Java Beans

46

Page 47: L ogics  for  D ata  and  K nowledge R epresentation

Example (cont’)Suppose we need to

classify “Visual Basic.Net Programming for Business” by Philip A. Koneman.

The document concept of such document d is:Cd = VisualBasicNet#1⊓Programming#2⊓Business#1

The nodes 2,5 conform to the “get specific” rule.

Level

0

1

2

3

4

(1)

(2) (3)

(4) (5)

(7)(6)

(8)

Subjects

Computers andInternet

Business andInvesting

Small Business and

Entrepreneurship

Programming

New BusinessEnterprises

Java Language

Java Beans

47

Page 48: L ogics  for  D ata  and  K nowledge R epresentation

What have we done by far? Classify documents.

How? Get specific algorithm! But how to implement the algorithm?

ClassL!

We are reasoning with the ‘Concept Realization’ service of ClassL!

(With an empty ABox.)

⊨Cj C⊑ i and ⊨ Cd⊑ Cj

48

Page 49: L ogics  for  D ata  and  K nowledge R epresentation

Outline Ontologies

Lightweight Ontologies

Classifications

Optimization of Classifications

Document Classification in LOs

Query-answering in LOs

Semantic Matching

49

Page 50: L ogics  for  D ata  and  K nowledge R epresentation

Intuitive Query-answering Query-answering on a hierarchy of documents

based on a query q as a set of keywords is defined in two steps:

1. The ClassL proposition Cq is build from q by converting q’s keywords as said above.

2. The set of answers (retrieval set) to q is defined as a set of subsumption checking problems in Ground ClassL:

Aq ={d∈ document | T⊨ Cd C⊑ q}.

50

Page 51: L ogics  for  D ata  and  K nowledge R epresentation

Query-Answering: A Problem Searching on all the documents may be expensive

(millions of documents classified).

We define a set of nodes which contain only answers to a query q as follows:

Nsq ={ni node| T⊨Ci ⊑ Cq}

NOTE: Each document d in ni in Nsq is an answer to

the query q, because T⊨ Cd C⊑ i by definition of classification.

Thus all the documents d in Nsq ∈ Aq.

51

Page 52: L ogics  for  D ata  and  K nowledge R epresentation

Query-Answering: Classification Set We extend Ns

q (named sound classification answer) by adding a set of nodes (named query classification set) defined as:

Clq ={ni node | d n∈ i and ⊨Cd ≡ Cq}

i.e., the nodes which constitute the classification set of a document d, whose concept Cd is equivalent to Cq.

52

Page 53: L ogics  for  D ata  and  K nowledge R epresentation

Query-Answering: Sound Answer Set The set of answers (retrieval set) to q is finally

defined as the following set:

Asq =df {d n∈ i | ni ∈ Ns

q} ∪ {d n∈ i | ni C∈ lq and ⊨Cd C⊑ q}.

Under this definition, an answer to a query are documents from nodes whose concepts are more specific than the query concept.

53

Page 54: L ogics  for  D ata  and  K nowledge R epresentation

Example Suppose that a user makes

a query q, which is converted into Cq = Java#3⊓Cobol#1, where Cobol#1 is “common business-oriented language.”

It can be shown that Ns

q = {7,8}.

Exercise: show it.

Level

0

1

2

3

4

(1)

(3)

(5)

(7)

(8)

Subjects

Computers andInternet

Programming

Java Language

Java Beans

Nsq

Nsq

54

Page 55: L ogics  for  D ata  and  K nowledge R epresentation

Example (cont’) It can be shown that

Nsq = {7,8}.

“Java for COBOL programmers, 2nd ed.” is classified in node 2, so it is not an aswer by using only Ns

q = {7,8}.

We then consider Clq to compute more answers, among others are the documents in node 5.

Level

0

1

2

3

4

(1)

(3)

(5)

(7)

(8)

Subjects

Computers andInternet

Programming

Java Language

Java Beans

Clq

55

Page 56: L ogics  for  D ata  and  K nowledge R epresentation

Sound Answer Set: Remark The set As

q is sound (i.e., contains answers to q), but not complete (i.e., does not contain all the answers to q).

See the next example.

d

EuropeanUnion

Pictures

(1)

(2)

56

Page 57: L ogics  for  D ata  and  K nowledge R epresentation

Sound Answer Set Example Suppose that a user makes

a query q like “video or pictures of Italy,” which is converted into Cq = Italy#1 ⊓(Video#2 Pictures#1).⊔

Cq is equivalent to:

Cq1 = Video#2 Italy#1,⊓

Cq2 = Pictures#1 Italy#1.⊓

d

EuropeanUnion

Pictures

(1)

(2)

57

Page 58: L ogics  for  D ata  and  K nowledge R epresentation

Sound Answer Set: Example (cont’) But not |= C2 ⊑ C1

q,

hence a document d in 2 about Rome, with Cd = Pictures#1 Rome#1 ⊓

is not retrieved, since:

Nsq = {ni |= Ci ⊑ Cq} = ∅ and

Clq ={1}, so d ∉ Asq.

(Asq is not complete)

d

EuropeanUnion

Pictures

(1)

(2)

58

Page 59: L ogics  for  D ata  and  K nowledge R epresentation

Some Comments The edge structure of a LO is not considered for

document classification, neither for query answering.

The edges information becomes redundant, as it is implicitly encoded in the “concept at a node” notion.

There are more than one way to build a LO from a set of concepts at nodes.

59

Page 60: L ogics  for  D ata  and  K nowledge R epresentation

What have we done in Query Answering? Find the set of documents.

How? Find the concept that is subsumed by the query.

But how to implement it?

ClassL!

We are reasoning with the ‘Concept subsumption’ service of ClassL!

⊨Cd ⊑ Cq

60

Page 61: L ogics  for  D ata  and  K nowledge R epresentation

Outline Ontologies

Lightweight Ontologies

Classifications

Optimization of Classifications

Document Classification in LOs

Query-answering in LOs

Semantic Matching

61

Page 62: L ogics  for  D ata  and  K nowledge R epresentation

Date: a matching?

62

Page 63: L ogics  for  D ata  and  K nowledge R epresentation

Why Matching? Most popular knowledge can be represented as

graphs. The heterogeneity between knowledge graphs demands the exposition of relations, such as semantically equivalent.

Some popular situations that can be modeled as a matching problem are:

Concept matching in semantic networks.

Schema matching in distributed databases.

Ontology matching (ontology “alignment”) in the Semantic Web.

63

Page 64: L ogics  for  D ata  and  K nowledge R epresentation

A Matching Problem (Example)

64

?

?

?

Page 65: L ogics  for  D ata  and  K nowledge R epresentation

Relational DB Schemas Let us consider the following relational database

(RDB) model, say “BANK”:

65

(Giunchiglia & Shvaiko, 2007)

Page 66: L ogics  for  D ata  and  K nowledge R epresentation

Relational DB Schemas: Representation 1 We can represent the RDB model “BANK” as a graph

(a tree) with root “BANK”:

The RDB model is first partitioned into relations, then attributes and data instances.

66

(Giunchiglia & Shvaiko, 2007)

Page 67: L ogics  for  D ata  and  K nowledge R epresentation

Relational DB Schemas: Representation 2 We can represent the RDB model “BANK” as a graph

(a tree) with root “BANK”:

The model is partitioned into relations, then into tuples, attributes and data instances.

67

(Giunchiglia & Shvaiko, 2007)

Page 68: L ogics  for  D ata  and  K nowledge R epresentation

Relational DB Schemas: NOTEs Which of the two representations is more preferable

depends on the concrete task?

It is always possible to transform one representation into the other.

In contrast to the example of RDB “BANK”, DB schemas are seldom trees. More often, DB schemas are translated into Directed Acyclic Graphs (DAG’s).

68

Page 69: L ogics  for  D ata  and  K nowledge R epresentation

OODB Schemas Let us consider the RDB “BANK” in terms of an

object-oriented DB (OODB) schema:BRANCH (Street, City, Zip) PERSON (F_Name, L_Name) STAFF : PERSON (Position, Salary, Manager)

The resulting graph is:

69

(Giunchiglia & Shvaiko, 2007)

Page 70: L ogics  for  D ata  and  K nowledge R epresentation

OODB Schemas: NOTEs

OODB schemas capture more semantics than the relational DBs.

In particular, an OODB schema: explicitly expresses subsumption relations between

elements; admits special types of arcs for part/whole relationships

in terms of aggregation and composition.

70

Page 71: L ogics  for  D ata  and  K nowledge R epresentation

Semi-structured Data Neither RDBs nor OODBs capture all the features of

semi-structured or unstructured data (Buneman, 1997): semi-structured data do not possess a regular structure

(schemaless); the “structure” of semi-structured data could be partial or

even implicit.

Typical examples are: HTML and XML.

71

Page 72: L ogics  for  D ata  and  K nowledge R epresentation

XML Schemas XML schemas can be represented as DAGs. The graph from the RDB “BANK” could also be

obtained from an XML schema.

72

(Giunchiglia & Shvaiko, 2007)

Page 73: L ogics  for  D ata  and  K nowledge R epresentation

XML Schemas: NOTEs Often XML schemas represent hierarchical data

models. In this case the only relationships between the elements

are {is-a}.

Attributes in XML are used to represent extra information about data. There are no strict rules telling us when data should be represented as elements, or as attributes.

73

Page 74: L ogics  for  D ata  and  K nowledge R epresentation

Concept Hierarchies Def. A concept hierarchy is a semi-formal

conceptualization of an application domain in terms of concepts and relationships.

74

(Giunchiglia & Shvaiko, 2007)

Page 75: L ogics  for  D ata  and  K nowledge R epresentation

Concept Hierarchies: NOTEs Examples are classification hierarchies, e.g.,

and directories (catalogs). Classification hierarchies / Web directories are

sometimes referred to as lightweight ontologies (Uschold & Gruninger, 2004). However: They are not ontologies, as they lack of a formal

semantics (semi-formal vs formal.) They don’t formalize class instances.

75

Page 76: L ogics  for  D ata  and  K nowledge R epresentation

The Matching Problems A Matching Problem (syntactic or semantic) is a

problem on graphs summarized as:

Given two finite graphs, is there a matching between the (nodes of the) two graphs?

In other words: given two graph-like structures (e.g., concept hierarchies or ontologies), produce a mapping between the nodes of the graphs that semantically correspond to each other.

76

Page 77: L ogics  for  D ata  and  K nowledge R epresentation

Matching Procedures A problem of matching can be decomposed in two

steps:

1. Extract the graphs from the conceptual models under consideration;

2. Match the resulting graphs.

Below we show some examples of step 1. (We follow [Giunchiglia & Shvaiko, 2007].)

77

Page 78: L ogics  for  D ata  and  K nowledge R epresentation

Syntactic VS. Semantic

78

Matching

Semantic Matching

Syntactic Matching

Relations are computed between labels at nodes

R = {x[0,1]}

Relations are computed between concepts at nodes

R = { =, ⊑, ⊒, , ⊓} Note: all

previous systems are syntactic…

Note: needed for proper labeling of context mappings

Page 79: L ogics  for  D ata  and  K nowledge R epresentation

Semantic Matching Mapping element is a 4-tuple < IDij, n1i, n2j, R >, where

IDij is a unique identifier of the given mapping element;

n1i is the i-th node of the first graph;

n2j is the j-th node of the second graph;

R specifies a semantic relation between the concepts at the given nodes

Semantic Matching: Given two graphs G1 and G2, for any node n1i G1, find the strongest semantic relation R holding with node n2j G2.

Computed R’s, listed in the decreasing binding strength order: equivalence { = }; more general/ specific {⊒,⊑}; mismatch { }; overlapping {⊓}.

79

Familiar?

Page 80: L ogics  for  D ata  and  K nowledge R epresentation

Example

?

< ID22, 2, 2, = >

=

?

?

< ID22, 2, 2, = >

4

Images

Europe

ItalyAustria

2

3 4

1

Italy

Europe

Wine and Cheese

Austria

Pictures

1

2 3

5

< ID21, 2, 1, ⊑ >

< ID24, 2, 4, ⊒ >

80

Page 81: L ogics  for  D ata  and  K nowledge R epresentation

S-Match AlgorithmFour Macro StepsGiven two labeled trees T1 and T2, do:1. For all labels in T1 and T2 compute concepts at labels 2. For all nodes in T1 and T2 compute concepts at nodes3. For all pairs of labels in T1 and T2 compute relations between

concepts at labels4. For all pairs of nodes in T1 and T2 compute relations between

concepts at nodes

Steps 1 and 2 constitute the preprocessing phase, and are executed once and each time after the schema/ontology is changed (OFF- LINE part). This is the SAME as the procedure of formalizing a classification into a lightweight ontology as dicussed in last lecture.

Steps 3 and 4 constitute the matching phase, and are executed every time the two schemas/ontologies are to be matched (ON - LINE part)

81

Page 82: L ogics  for  D ata  and  K nowledge R epresentation

Step 1: Compute concepts at labels The idea:

Translate natural language expressions into internal formal language Compute concepts based on possible senses of words in a label and

their interrelations Preprocessing:

Tokenization. Labels (according to punctuation, spaces, etc.) are parsed into tokens. E.g., Wine and Cheese <Wine, and, Cheese>;

Lemmatization. Tokens are morphologically analyzed in order to find all their possible basic forms. E.g., Images Image;

Building atomic concepts. An oracle (WordNet) is used to extract senses of lemmatized tokens. E.g., Image has 8 senses, 7 as a noun and 1 as a verb;

Building complex concepts. Prepositions, conjunctions, etc. are translated into logical connectives and used to build complex concepts out of the atomic concepts

E.g., CWine and Cheese = <Wine, U(WNWine)> ⊔ <Cheese, U(WNCheese)>,

where ⋃ is a union of the senses that WordNet attaches to lemmatized tokens

82

Page 83: L ogics  for  D ata  and  K nowledge R epresentation

Step 2: Compute concepts at nodes The idea: extend concepts at labels by capturing the knowledge

residing in a structure of a graph in order to define a context in which the given concept at a label occurs

Computation (basic case): Concept at a node for some node n is computed as an intersection of concepts at labels located above the given node, including the node itself

Wine and Cheese

Italy

Europe

Austria

Pictures

1

2 3

4 5

C4 = Ceurope ⊓CPictures⊓CItaly

83

Page 84: L ogics  for  D ata  and  K nowledge R epresentation

Step 3: compute relations between concepts at labels

The idea: Exploit a priori knowledge, e.g., lexical, domain knowledge with the help of element level semantic matchers

Results of step 3:

Italy

Europe

Wine and

Cheese

Austria

Pictures

1

2 3

4 5

Europe

Italy Austria

2

3 4

1

Images

T1 T2

T2

=CItaly

=CAustria

=CEurope

=CImages

CAustriaCItalyCPicturesCEuropeT1

CWine CCheese

84

Page 85: L ogics  for  D ata  and  K nowledge R epresentation

Don’t hurry to Step 4! What we have done in Step 3?

Find semantic relations. What for?

To build the semantic relation bases for further matching.

What is a ‘relation base’ in the logic sense?

TBox!We are building the TBox!

85

Page 86: L ogics  for  D ata  and  K nowledge R epresentation

Step 4: compute relations between concepts at nodesThe idea: Reduce the matching problem to a

validity problem

Context: the relations between concepts at labels

Wffrel (C1i, C2j): relation to be proved (with: C1i in Tree1 and C2j in Tree2)

Translate into propositional logic: C1i = C2j is translated into C1i C2j C1i subsumes C2j is translated into C1i C2j

C1i C2j is translated into ¬ (C1i C2j) Prove Context Wffrel (C1i, C2j) is valid with

Rel={=,⊑,⊒,⊥,⊓} A propositional formula is valid iff its negation is

unsatisfiable (SAT deciders are sound and complete…)86

Page 87: L ogics  for  D ata  and  K nowledge R epresentation

Pseudo code of Step4 1. i, j, N1, N2: int; 2. context, goal: wff; 3. n1, n2: node; 4. T1, T2: tree of (node); 5. relation = {=, ⊑ , ⊒ , }; 6. ClabMatrix(N1, N2), CnodMatrix(N1, N2), relation: relation

7. function mkCnodMatrix(T1, T2, ClabMatrix) { 8. for (i = 0; i < N1; i++) do 9. for (j = 0; j < N2; j++) do 10. CnodMatrix(i, j):=NodeMatch(T1(i),T2(j), ClabMatrix)}

11. function NodeMatch(n1, n2, ClabMatrix) { 12. context:=mkcontext(n1, n2, ClabMatrix, context); 13. foreach (relation in < =, ⊑ , ⊒ , >) do { 14. goal:= w2r(mkwff(relation, GetCnod(n1), GetCnod(n2)); 15. if VALID(mkwff(, context, goal)) 16. return relation;} 17. return IDK;}

Validity Reasoning Problem:

Context⊨goal?

87

Page 88: L ogics  for  D ata  and  K nowledge R epresentation

Example Example. Suppose we want to check if C12 = C22

88

T2

=C14

C13

=C12

C11

C25C24C23C22C21T1

(C1Images C2Pictures) (C1Europe C2Europe) (C12 C22 )

Context Goal

Page 89: L ogics  for  D ata  and  K nowledge R epresentation

Examples

Italy

Europe

Wine and

Cheese

Austria

Pictures

1

2 3

4 5

Europe

Italy Austria

2

3 4

1Images

T1 T2=

Italy

Europe

Wine and

Cheese

Austria

Pictures

1

2 3

4 5

Europe

Italy Austria

2

3 4

1Images

T1 T2

Italy

Europe

Wine and

Cheese

Austria

Pictures

1

2 3

4 5

Europe

Italy Austria

2

3 4

1Images

T1 T2

Italy

Europe

Wine and

Cheese

Austria

Pictures

1

2 3

4 5

Europe

Italy Austria

2

3 4

1Images

T1 T2

89

Page 90: L ogics  for  D ata  and  K nowledge R epresentation

Short Summary What is done in Step 4? Find implicit semantic relations.

How? Validity reasoning based on TBox built in Step 3!

What is working on the backend of semantic matching?

ClassL reasoning!

90

Page 91: L ogics  for  D ata  and  K nowledge R epresentation

A Real S-Match System Structure

91

Page 92: L ogics  for  D ata  and  K nowledge R epresentation

Testing Methodology

92

Measuring match quality Expert mappings are inherently subjective Two degrees of freedom

Directionality

Use of Oracles

Indicators Precision, [0,1] (correctness criterion, how many false positives?)

Recall, [0,1] (completeness criterion, how many false negatives?)

Overall, [-1,1]

F-measure, [0,1]

Time, sec.

Matching systems: S-Match vs. Cupid, COMA and SF as implemented in Rondo

Page 93: L ogics  for  D ata  and  K nowledge R epresentation

Preliminary Experimental Results

Average Results

0.0

0.2

0.4

0.6

0.8

1.0

Rondo Cupid COMA S-match

0.02.0

4.06.0

8.010.0

12.0sec

Precision Recall Overall F-measure Time

93

Three experiments, test cases from different domains

Some characteristics of test cases: #nodes 4-39, depth 2-3

PC: PIV 1,7Ghz; 256Mb. RAM; Win XP

Page 94: L ogics  for  D ata  and  K nowledge R epresentation

References & Credits References:

F. Giunchiglia, P. Shvaiko, “Semantic matching.” Knowledge Engineering Review, 18(3):265-280, 2003.

F. Giunchiglia, M. Marchese, I. Zaihrayeu. “Encoding Classifications into Lightweight Ontologies.” J. of Data Semantics VIII, Springer-Verlag LNCS 4380, pp 57-81, 2007.

F.Giunchiglia, I.Zaihrayeu. “Lightweight Ontologies” Encyclopedia of Database Systems , Springer-Verlag, 2008.

Available as a DIT Technical Report here.

94


Recommended