+ All Categories
Home > Documents > Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine...

Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine...

Date post: 23-Mar-2020
Category:
Upload: others
View: 15 times
Download: 0 times
Share this document with a friend
65
Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science University of Bari BDA 2017 Nancy, France - November 15, 2017
Transcript
Page 1: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Mining by exploiting Machine Learningfor Semantic Data Management

Claudia d’Amato

Department of Computer ScienceUniversity of Bari

BDA 2017 � Nancy, France - November 15, 2017

Page 2: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Introduction & Motivation

Semantic data managementa range of methods and techniques

for the manipulation and usageof data based on its meaning

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 2 / 59

Page 3: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Introduction & Motivation

Semantic Web Goalmaking data on the Web machine understandable

Data meaning needs to be

explicit

formally defined

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 3 / 59

Page 4: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Introduction & Motivation

Ontologies ⇒ basic element for realizing the semanticinteroperability

on the Web and in other contextsact as a shared vocabulary for assigning data semantics

Examples of existing real ontologies

Schema.org

Gene Ontology

Foundational Model of Anatomy ontology

Financial Industry Business Ontology (by OMG Finance Domain Task Force)

GoodRelations

. . .

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 4 / 59

Page 5: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Introduction & Motivation

Reasoning on Description Logics Ontologies

OWL adopted ⇒ Description Logics theoretical foundation

Ontologies are equipped with deductive reasoning capabilities ⇒ allowing to makeexplicit, knowledge that is implicit within them

Deduction:”Credit du Nord”,”Credit Agricole”

are also Company

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 5 / 59

Page 6: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Introduction & Motivation

Reasoning on Description Logics Ontologies

OWL adopted ⇒ Description Logics theoretical foundation

Ontologies are equipped with deductive reasoning capabilities ⇒ allowing to makeexplicit, knowledge that is implicit within them

Deduction:”Credit du Nord”,”Credit Agricole”

are also Company

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 5 / 59

Page 7: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Introduction & Motivation

Reasoning on Description Logics Ontologies

Question: would it be possible to discover new/additional knowledge byexploiting the evidence coming from the assertional data?

Deduction:”Credit du Nord”,”Credit Agricole”

are also Company

Incompleteness

”UniCredit” is a Bank

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 6 / 59

Page 8: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Introduction & Motivation

Reasoning on Description Logics Ontologies

Deduction:”Credit du Nord”,”Credit Agricole”

are also Company

Inconsistency

Mellon cannot bea Person and a Bank

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 7 / 59

Page 9: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Introduction & Motivation

Reasoning on Description Logics Ontologies

Question: would it be possible to discover new/additional knowledge byexploiting the evidence coming from the assertional data?

Deduction:”Credit du Nord”,”Credit Agricole”

are also Company

Noise

Person ≡ ¬Bank missing

Idea: exploiting Machine Learning methods for Ontology Mining related tasks[d’Amato et al. @SWJ’10]

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 8 / 59

Page 10: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Introduction & Motivation

Reasoning on Description Logics Ontologies

Question: would it be possible to discover new/additional knowledge byexploiting the evidence coming from the assertional data?

Deduction:”Credit du Nord”,”Credit Agricole”

are also Company

Noise

Person ≡ ¬Bank missing

Idea: exploiting Machine Learning methods for Ontology Mining related tasks[d’Amato et al. @SWJ’10]

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 8 / 59

Page 11: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Introduction & Motivation

Reasoning on Description Logics Ontologies

Question: would it be possible to discover new/additional knowledge byexploiting the evidence coming from the assertional data?

Deduction:”Credit du Nord”,”Credit Agricole”

are also Company

Noise

Person ≡ ¬Bank missing

Idea: exploiting Machine Learning methods for Ontology Mining related tasks[d’Amato et al. @SWJ’10]

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 8 / 59

Page 12: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Basics

Definition (Ontology Mining)

All activities that allow for

discovering hidden knowledge fromontological knowledge bases

Special Focus on:

(similarity-based) inductive learning methods

use specific examples to reach general conclusionsare known to be very efficient and fault-tolerant

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 9 / 59

Page 13: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Basics

Definition (Ontology Mining)

All activities that allow for

discovering hidden knowledge fromontological knowledge bases

Special Focus on:

(similarity-based) inductive learning methods

use specific examples to reach general conclusionsare known to be very efficient and fault-tolerant

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 9 / 59

Page 14: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Basics

Induction vs. Deduction

Deduction (Truth preserving)

Given:

a set of general axioms

a proof procedure

Draw:

correct and certainconclusions

Induction (Falsity preserving)

Given:

a set of examples

Determine:

a possible/plausiblegeneralization covering

the givenexamples/observationsnew and not previouslyobserved examples

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 10 / 59

Page 15: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Mining Tasks

Instance Retrieval (Instance Level)

Ontology Enrichment (Schema Level)

Concept Drift and Novelty Detection (Ontology Dynamic)

from an inductive perspective

Focus on: similarity-based methods

Page 16: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Mining Tasks

Instance Retrieval (Instance Level)

Ontology Enrichment (Schema Level)

Concept Drift and Novelty Detection (Ontology Dynamic)

from an inductive perspective

Page 17: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Instance Retrieval as a Classification Problem

Introducing Instance Retrieval I

Instance Retrieval → Finding the extension of a query concept

Instance Retrieval (Bank) = {”Credit du Nord”, ”Credit Agricole”}

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 13 / 59

Page 18: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Instance Retrieval as a Classification Problem

Introducing Instance Retrieval I

Problem: Instance Retrieval in incomplete/inconsistent/noisy ontologies

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 14 / 59

Page 19: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Instance Retrieval as a Classification Problem

Introducing Instance Retrieval II

Problem: Instance Retrieval in incomplete/inconsistent/noisy ontologies

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 15 / 59

Page 20: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Instance Retrieval as a Classification Problem

Introducing Instance Retrieval III

Problem: Instance Retrieval in incomplete/inconsistent/noisy ontologies

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 16 / 59

Page 21: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Instance Retrieval as a Classification Problem

Issues & Solutions I

IDEA

Casting the problem as a Machine Learning classification problem

assess the class membership of individuals in a Description Logic(DL) KB w.r.t. the query concept

State of art classification methods cannot be straightforwardly applied

generally applied to feature vector representation→ upgrade DL expressive representations

implicit Closed World Assumption made in ML→ cope with the Open World Assumption made in DLs

classes considered as disjoint→ cannot assume disjointness of all concepts

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 17 / 59

Page 22: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Instance Retrieval as a Classification Problem

Issues & Solutions II

Adopted Solutions:

Defined new semantic similarity measures for DL representations

to cope with the high expressive power of DLsto deal with the semantics of the compared objects (concepts,individuals, ontologies)to convey the underlying semantics of KB

Formalized a set of criteria that a similarity function has to satisfy inorder to be defined semantic [d’Amato et al. @ EKAW 2008]

Definition of the classification problem taking into account the OWA

Multi-class classification problem decomposed into a set a smallerclassification problems

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 18 / 59

Page 23: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Instance Retrieval as a Classification Problem

Definition (Problem Definition)

Given:

a populated ontological knowledge base K= 〈T ,A〉a query concept Q

a training set with {+1,−1, 0} as target values

Learn a classification function f such that: ∀a ∈ Ind(A) :

f (a) = +1 if a is instance of Q

f (a) = −1 if a is instance of ¬Qf (a) = 0 otherwise (unknown classification because of OWA)

Dual Problem

given an individual a ∈ Ind(A), tell concepts C1, . . . ,Ck in K it belongs to

the multi-class classification problem is decomposed into a set of ternaryclassification problems (one per target concept)

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 19 / 59

Page 24: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Instance Retrieval as a Classification Problem

Developed methods

Pioneering the Problem

relational K-NN for DL KBs [d’Amato et al. ESWC’08]

Improving the effectiveness/efficiency

kernel functions for kernel methods to be applied to DLs KBs [Fanizzi,d’Amato et al. @ ISMIS’06, JWS 2012; Bloehdorn and Sure @ ISWC’07]

Scaling on large datasets

Statistical Relational Learning methods for large scale and data sparseness[Huang et al. @ ILP’10, Minervini et a. @ ICMLA’15]

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 20 / 59

Page 25: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Instance Retrieval as a Classification Problem

Example: Nearest Neighbor Classification

query concept: Bank k = 7target values standing for the class values: {+1, 0,−1}

xq

+1

+1

+1

+1

+1

−1

−1+1

−1+1

0

0

0

0

0

query individual

class(xq)← ?

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 21 / 59

Page 26: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Instance Retrieval as a Classification Problem

Example: Nearest Neighbor Classification

query concept: Bank k = 7target values standing for the class values: {+1, 0,−1}

xq

+1

+1

+1

+1

+1

−1

−1+1

−1+1

0

0

0

0

0

query individual

class(xq)← +1

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 22 / 59

Page 27: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Instance Retrieval as a Classification Problem

Lesson Learnt from experiments I

Experiments performed on ontologies publicly available

Results compared with a standard deductive reasoner

Registered mismatches: Induction: {+1,−1} - Deduction: no results

Evaluated as mistake if precision and recall were used while it couldturn out to be a correct inference when judged by a human

Need for new metrics → Defined to distinguish induced assertions from mistakes

Reasoner+1 0 -1

Inductive +1 M I CClassifier 0 O M O

-1 C I M

M Match Rate O Ommission Error RateC Commission Error Rate I Induction Rate

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 23 / 59

Page 28: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Instance Retrieval as a Classification Problem

Lesson Learnt from experiments II

Commission error almost zero on average

Omission error rate very low and only in some cases

Not null for ontologies in which disjoint axioms are missing

Induction Rate not zero

new knowledge (not logically derivable) induced ⇒ can be used forsemi-automatizing the ontology population task

match commission omission inductionSWM 97.5 ± 3.2 0.0 ± 0.0 2.2 ± 3.1 0.3 ± 1.2

LUBM 99.5 ± 0.7 0.0 ± 0.0 0.5 ± 0.7 0.0 ± 0.0NTN 97.5 ± 1.9 0.6 ± 0.7 1.3 ± 1.4 0.6 ± 1.7

Financial 99.7 ± 0.2 0.0 ± 0.0 0.0 ± 0.0 0.2 ± 0.2

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 24 / 59

Page 29: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Mining Tasks

Instance Retrieval (Instance Level)

Ontology Enrichment (Schema Level)

Concept Drift and Novelty Detection (Ontology Dynamic)

from an inductive perspective

Page 30: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment as

a Disjointness Axioms Discovery Problem

Page 31: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem

Disjointness axioms often missing within ontologiesProblems:

introduction of noise

Noise

Person ≡ ¬Bank missing

counterintuitive inferences

K ={JournalPaper v Paper , ConferencePaper v Paper , ConferencePaper(a) }

K |= JournalPaper(a)?Answer: Unknown

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 27 / 59

Page 32: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem

Observation: extensions of disjoint concepts do not overlap

Question: would it be possible to automatically capture disjointnessaxioms by analyzing the data configuration/distribution?

Idea: Exploiting (Conceptual) clustering methods for the purpose

Definition (Problem Definition)

Given

an ontological knowledge base K = 〈T ,A〉a set of individuals I ⊆ Ind(A)

Find

n pairwise disjoint clusters {C1, . . . ,Cn}for each i = 1, . . . , n, a concept description Di that describesCi , such that:

∀a ∈ Ci : K |= Di (a)∀b ∈ Cj , j 6= i : K |= ¬Di (b).

Hence ∀Di ,Dj , i 6= j : K |= Dj v ¬Di .

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 28 / 59

Page 33: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem

Basics on Clustering Methods

Clustering methods: unsupervised inductive learning methods thatorganize a collection of unlabeled resources into meaningful clusters suchthat

intra-cluster similarity is high

inter-cluster similarity is low

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 29 / 59

Page 34: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem

Basics on Clustering Methods

Clustering methods: unsupervised inductive learning methods thatorganize a collection of unlabeled resources into meaningful clusters suchthat

intra-cluster similarity is high

inter-cluster similarity is low

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 30 / 59

Page 35: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem

Basics on Clustering Methods

Clustering methods: unsupervised inductive learning methods thatorganize a collection of unlabeled resources into meaningful clusters suchthat

intra-cluster similarity is high

inter-cluster similarity is low

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 31 / 59

Page 36: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem

Observation: extensions of disjoint concepts do not overlap

Question: would it be possible to automatically capture them byanalyzing the data configuration/distribution?

Idea: Exploiting (Conceptual) clustering methods for the purpose

Definition (Problem Definition)

Given

an ontological knowledge base K = 〈T ,A〉a set of individuals I ⊆ Ind(A)

Find

n pairwise disjoint clusters {C1, . . . ,Cn}for each i = 1, . . . , n, a concept description Di that describesCi , such that:

∀a ∈ Ci : K |= Di (a)∀b ∈ Cj , j 6= i : K |= ¬Di (b).

Hence ∀Di ,Dj , i 6= j : K |= Dj v ¬Di .C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 32 / 59

Page 37: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem

Learning Disjointness Axioms: Developed Methods

Statistical-based approach

NAR - exploiting negative association rules [Fleischhacker et al. @OTM’11]

PCC - exploiting Pearson’s correlation coeff. [Volker at al.@JWS2015]

do not exploit any background knowledge

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 33 / 59

Page 38: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem

Terminological Cluster Tree

Defined a method for eliciting disjointness axioms [Rizzo et.al.@ESWC’17]

solving a clustering problem via learning Terminological Cluster Trees

providing a concept description for each cluster

Definition (Terminological cluster tree (TCT))

A binary logical tree where

a node stands for a cluster of individuals C

each inner node contains a description D (over the signature of K)

each departing edge corresponds to positive (left) and negative (right)examples of D

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 34 / 59

Page 39: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem

Example of TCT

Given I ⊆ Ind(A), an example of TCT describing individuals in theSemantic Web research community

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 35 / 59

Page 40: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem

Collecting Disjointness Axioms

Given a TCT T:Step I:

Traverse the T to collect the concept descriptions describing theclusters at the leaves

A set of concepts CS is obtained

Step II:

A set of candidate axioms A is generated from CS:an axiom D v ¬E (D,E ∈ CS) is generated if

D 6≡ E (or D 6v E or viceversa)E v ¬D has not been generated

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 36 / 59

Page 41: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem

Collecting Disjointness Axioms: Example

CS = { Person, Person u ∃hasPublication.>, ¬(Person u ∃hasPublication.>),Person∃hasPublication.SWPaper, ¬Proceedings,¬Person u Proceedings, · · · }

Axiom1: Person u ∃hasPublication.SWPaper v ¬(¬Proceedings)Axiom2: · · · serve stringa quanto quella sopra per allineare assio

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 37 / 59

Page 42: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem

Inducing a TCT

Given the set of individuals I and > concept

Divide-and-conquere approach adopted

Base Case: test the stopConditionthe cohesion of the cluster I exceeds a threshold ν

distance between medoids below a threshold ν

Recursive Step (stopCondition does not hold):

a set S of refinements of the current (parent) description C generatedthe bestConcept E∗ ∈ S is selected and installed as current node

the one showing the best cluster separation ⇔ with max distancebetween the medoids of its positive P and negative N individuals

I is split in:

Ileft ⊆ I ↔ individuals with the smallest distance wrt the medoid of PIright ⊆ I ↔ individuals with the smallest distance wrt the medoid of N

Note: Number of clusters not required - obtained from data distribution

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 38 / 59

Page 43: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem

Lesson Learnt from experiments I

Experiments performed on ontologies publicly available

Goal I: Re-discover a target axiom (existing in K)Setting:

A copy of each ontology is created removing a target axiomThreshold ν = 0.9, 0.8, 0.7Metrics # discovered axioms and #cases of inconsistency

Results:

target axioms rediscovered for almost all casesadditional disjointness axioms discovered in a significant numberlimited number of inconsistencies found

OntologyTCT 0.9 TCT 0.8 TCT 0.7

#inc. #ax’s #inc. #ax’s #inc. #ax’sBioPax 2 53 2 53 3 52NTN 10 70 9 73 10 75

Financial 0 125 0 126 0 127GeoSkills 2 345 1 347 4 347Monetary 0 432 0 432 0 433DBPedia3.9 45 45 44 44 43 43

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 39 / 59

Page 44: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem

Lesson Learnt from experiments II

Goal II:

Re-discover randomly selected target axioms added according to theStrong Disjointness Assumption [Schlobach et al. @ ESWC 2005]

two sibling concepts in a subsumption hierarchy considered as disjoint

comparative analysis with statistical-based methods [Volker at al. @JWS 2015, Fleischhacker et al. @ OTM’11]

PCC - based on Pearson’s correlation coefficientNAR - exploiting negative association rules

Setting:A copy of each ontology created removing 20%, 50%, 70% of thedisjointness axioms

The copy used to induce TCT - ν = 0.9, 0.8, 0.7 - # Run: 10 times

Metrics: rate of rediscovered target axioms, #cases of inconsistency,# addional discovered axioms

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 40 / 59

Page 45: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem

Lesson Learnt from experiments III

Results:almost all axioms rediscovered

Rate decreases when larger fractions of axioms removed, as expected

TCT outperforms PCC and NAR wrt additionally discovered axiomswhilst introducing limited inconsistency

TCT allows to express complex disjointness axioms

PCC and NAR tackle only disjointness between concept names

Exploiting the K as well as the data distribution improvesdisjointness axioms discovery

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 41 / 59

Page 46: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem

Example of axioms

Successfully discovered axioms

ExternalReferenceUtilityClass u ∃TAXONREF.>disjoint withxref

Activitydisjoint withPerson u ∃nationality.United states

Person u hasSex.Male (≡ Man)disjoint withSupernaturalBeing u God (≡ God)

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 42 / 59

Page 47: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment as

a Concept Learning Problem

Page 48: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Concept Learning Problem

On Learning Concept Descriptions I

Goal: Learning descriptions for a given concept name / expression

Example : Man ≡ Human uMale

Question: How to learn concept descriptions automatically, given a set ofindividuals?

IDEA

Regarding the problem as a supervised concept learning task

Supervised Concept Learning:

Given a training set of positive and negative examples for a concept name,

construct a description that will accurately classify whether future examplesare positive or negative.

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 44 / 59

Page 49: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Concept Learning Problem

On Learning Concept Descriptions II

Definition (Problem Definition)

Given

the KB K as a background knowledgea subset pos of individuals as positive examples of Ca subset neg of individuals as negative examples of C

Learn

a DL concept description D so thatthe individuals in pos are instances of D while those in neg are not

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 45 / 59

Page 50: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Concept Learning Problem

Developed Methods for Supervised Concept Learning

Separate-and-conquer approachYinYang [Iannone et al. @ Appl. Intell. J. 2007]DL-FOIL [Fanizzi et al. @ ILP 2008]DL-Learner [Lehmann et al. @ MLJ 2010, SWJ 2011]

Divide-and-conquer approachTermiTIS [Fanizzi et al. @ ECML 2010, Rizzo et al. @ ESWC 2015]

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 46 / 59

Page 51: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Concept Learning Problem

Separate and Conquer: Example

C1

C′1

+

+

+

++

+

+

C2

C′2

+

+

+

+

+

+

C1 = MasterStudent C ′1 = MasterStudent u ∃worskIn.>C2 = BachelorStudent C ′2 = BachelorStudent u ∃worskIn.>

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 47 / 59

Page 52: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Concept Learning Problem

On Evaluating the Learnt Concept Descriptions

Publicly available ontologies considered

A number (30) of satisfiable randomly generated concepts considered

Positive and negative examples collected for each concept by using adeductive reasoner

Running concept learning on the collected positive and negativeexamples

Inductive classification performed on the learnt concept descriptions

match commission omission inductionontology rate error rate error rate rateBioPax 76.9 ± 15.7 19.7 ± 15.9 7.0 ± 20.0 7.5 ± 23.7

NTN 78.0 ± 19.2 16.1 ± 4.0 6.4 ± 8.1 14.0 ± 10.1Financial 75.5 ± 20.8 16.1 ± 12.8 4.5 ± 5.1 3.7 ± 7.9

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 48 / 59

Page 53: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Enrichment Ontology Enrichment as a Concept Learning Problem

Examples of Learned Concept Descriptions with DL-FOIL

BioPaxinduced:Or( And( physicalEntity protein) dataSource)

original:Or( And( And( dataSource externalReferenceUtilityClass)

ForAll(ORGANISM ForAll(CONTROLLED phys icalInteraction)))

protein)

NTNinduced:Or( EvilSupernaturalBeing Not(God))

original:Not(God)

Financialinduced:Or( Not(Finished) NotPaidFinishedLoan Weekly)

original:Or( LoanPayment Not(NoProblemsFinishedLoan))

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 49 / 59

Page 54: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Ontology Mining Tasks

Instance Retrieval (Instance Level)

Ontology Enrichment (Schema Level)

Concept Drift and Novelty Detection (OntologyDynamic)

from an inductive perspective

Page 55: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Concept Drift and Novelty Detection as a Clustering Problem

Concept Drift and Novelty Detection

Ontologies evolve over the time ⇒ New assertions added.

Concept Driftchange of a concept towards a more general/specific one w.r.t. theevidence provided by new annotated individuals

almost all Worker work for more than 10 hours per days ⇒ HardWorker

Novelty Detectionisolated cluster in the search space that requires to be defined throughnew emerging concepts to be added to the KB

subset of Worker employed in a company ⇒ Employee

subset of Worker working for several companies ⇒ Free-lance

Idea: automatically capturing them by analyzing the dataconfiguration/distribution

Research Direction

Exploiting (Conceptual) clustering methods for the purpose

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 51 / 59

Page 56: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Concept Drift and Novelty Detection as a Clustering Problem

Concept Drift and Novelty Detection

Ontologies evolve over the time ⇒ New assertions added.

Concept Driftchange of a concept towards a more general/specific one w.r.t. theevidence provided by new annotated individuals

almost all Worker work for more than 10 hours per days ⇒ HardWorker

Novelty Detectionisolated cluster in the search space that requires to be defined throughnew emerging concepts to be added to the KB

subset of Worker employed in a company ⇒ Employee

subset of Worker working for several companies ⇒ Free-lance

Idea: automatically capturing them by analyzing the dataconfiguration/distribution

Research Direction

Exploiting (Conceptual) clustering methods for the purpose

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 51 / 59

Page 57: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Concept Drift and Novelty Detection as a Clustering Problem

Clustering Individuals of An Ontology: Developed Methods

Purely Logic-based

KLUSTER [Kietz & Morik,94]

CSKA [Fanizzi et al., 04]

Produce a flat outputSuffer from noise in thedata

Similarity-based ⇒ noise tolerant

Evolutionary ClusteringAlgorithm around Medoids[Fanizzi et al. @ IJSWIS 2008]

automatically assess the bestnumber of clusters

k-Medoid (hierarchical andfuzzy) clustering algorithm[Fanizzi et al. @ ESWC’08,Fundam. Inform.’10]

number of clusters required

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 52 / 59

Page 58: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Concept Drift and Novelty Detection as a Clustering Problem

Automated Concept Drift and Novelty Detection 1/3

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 53 / 59

Page 59: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Concept Drift and Novelty Detection as a Clustering Problem

Automated Concept Drift and Novelty Detection 2/3

The new instances are considered to be a candidate clusterAn evaluation of it is performed for assessing its nature

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 54 / 59

Page 60: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Concept Drift and Novelty Detection as a Clustering Problem

Automated Concept Drift and Novelty Detection 2/3

The new instances are considered to be a candidate cluster

An evaluation of it is performed for assessing its nature

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 54 / 59

Page 61: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Concept Drift and Novelty Detection as a Clustering Problem

Automated Concept Drift and Novelty Detection 3/3

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 55 / 59

Page 62: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Concept Drift and Novelty Detection as a Clustering Problem

Lesson Learnt from Experiments

Intentional descriptions learnt

by using DL concept learning algorithms

Clustering algorithms

applied on ontologies publicly available

evaluated by the use of standard validity clustering indexes (e.g.Generalized Dunns index, cohesion index, Silhouette index)

Necessity of a domain expert/gold standard particularly for validatingthe concept novelty/drift

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 56 / 59

Page 63: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Concept Drift and Novelty Detection as a Clustering Problem Conclusions

Conclusions

Machine Learning methods

could be usefully exploited for ontology mining

suitable in case of incoherent/noisy KBs

can be seen as an additional layer on top of deductive reasoningfor realizing new/additional forms of approximated reasoningcapabilities

Future directions:

Semi-Supervised Learning methods particularly appealing for LOD

Special focus on scalability issues

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 57 / 59

Page 64: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

That’s all!

Thank you

Nicola Fanizzi Giuseppe Rizzo Floriana Esposito

Page 65: Ontology Mining by exploiting Machine Learning for ... · Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d’Amato Department of Computer Science

Concept Drift and Novelty Detection as a Clustering Problem Conclusions

Refinement Operators

Downward refinement operators specializing a concept C

ρ1 C ′ = C u (¬)A;

ρ2 C ′ = C u (¬)(∃)R.>;

ρ3 C ′ = C u (¬)(∀)R.>;

ρ4 ∃R.C ′i ∈ ρ(∃R.Ci ) ∧ C ′i ∈ ρ(Ci );

ρ5 ∀R.C ′i ∈ ρ(∀R.Ci ) ∧ C ′i ∈ ρ(Ci ).

C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 59 / 59


Recommended