Reuse of Ontology Mappings

Post on 16-Jan-2017

113 views 2 download

transcript

1

REUSE OF ONTOLOGY MAPPINGS

Anika Groß, Database Group, Universität Leipzig

Canberra, March 2016

2

• Structured representation of knowledge

• Used for annotation as standardized semantic description of object properties

• Very large ontologies in the life sciences

ONTOLOGIES

Anatomy Molecular biology

ChemistryMedicine

Tissue

Anatomic Structure,System, or Substance

Organ

Lung SkinKidney …

3

MeSH

GALENSNOMED CT

NCI Thesaurus

Uberon

Mouse Anatomy

FMA

• Overlapping ontologies → creation of mappings/alignments• Useful for data integration, analysis across sources …

• Ontology mapping: set of semantic correspondences (links) between concepts of different ontologies

ONTOLOGY MAPPINGS

4

• Overlapping ontologies → creation of mappings/alignments• Useful for data integration, analysis across sources …

• Ontology mapping: set of semantic correspondences (links) between concepts of different ontologies

ONTOLOGY MAPPINGS

𝑶𝟐

tail

headneck

limbs

limb segments

body

𝑶𝟏

head

lower extremities

limbs

upper extremities

body

neck

trunk

tail

=

===

<<

=

𝑶𝑴𝑶𝟏,𝑶𝟐

• Manual or semi-automatic identification (matching)

5

• Ontologies are not static!

• Research, new knowledge continuous changes

• Release of new versions

• Ontology changes

→ Impact on dependent mappings and applications?

EVOLUTION OF ONTOLOGIES AND MAPPINGS

𝑶𝟏

0

𝑶𝟐

𝑶𝑴𝑶𝟏,𝑶𝟐

6

REUSE EXISTING MAPPINGS TO …

→ create new ontology mappings• “Indirect” matching: combine existing mappings to create

new mappings between so far unconnected sources

→ create up-to-date ontology mappings• Migration of outdated mappings to currently valid

ontology versions

Ontologies, ontology mappings, ontology evolution

2) Composition-based ontology matching3) Adaptation of ontology mappings

4) Outlook

7

ONTOLOGY MATCHING WORKFLOW

• Manual creation of mappings between very large ontologies is too labor-intensive

• Semi-automatic generation of semantic correspondences:linguistic, structural, instance-based matching techniques

Matching

Mappingsim(O1.a, O2.b) = 0.8sim(O1.a, O2.c) = 0.5sim(O1.c, O2.c) = 1.0

further input, e.g. instances, dictionary

O1

O2

Pre-processing

Post-processing

8

?

• Indirect composition-based matching

• Via intermediate ontology (IO):important hub ontology,synonym dictionary, …

MAPPING COMPOSITION

MA_0001421 UBERON:0001092 NCI_C32239

Synonym: Atlas Name: atlas

Name: C1 VertebraName: cervical vertebra 1 Synonym: cervical vertebra 1

Synonym: C1 vertebra

• Find new correspondences via composition

• Reuse existing mappings to

• Increase match quality & save computation time

IO

O1 O2

Groß, Hartung, Kirsten, Rahm: Mapping Composition for Matching Large Life Science Ontologies. 2nd International Conference on Biomedical Ontology (ICBO), 2011

9

• Use mappings to intermediate ontologies IO1, …, IOk

to indirectly match O1 and O2

• Reduce matching effort by reusing mappings to IO → very fast composition

INDIRECT MATCHING

...

IO1

IO2

IOk

O1 O2

...

O1

O2

On

HOOnew

→ IO should have a significant overlap with O1 and O2

→ IO1, …, IOk may complement each other

→ Centralized hub HO

→ many mappings to other ontologies

→ Onew aligned with any Oi via HO

10

• (Binary) compose operator• Composes two mappings 𝑀𝑂1,𝐼𝑂 and 𝑀𝐼𝑂,𝑂2 to create

a new mapping 𝑀𝑂1,𝑂2:

COMPOSE OPERATOR

11

O1IO1 O2

occ = 1: CMO1,O2 = {(a,a),(b,b),(c,c)}occ = 2: CMO1,O2 = {(a,a)}

Input: Two ontologies O1 and O2, list of intermediate ontologies IO1… IOk, occurrence count occ

Output: Composed mapping CMO1,O2

COMPOSEMATCH

a

b c

d e

a

b

g h

a

b c

d

f

a

i c

IO2MapList empty

for each IOi IO do

MO1,IOi getMapping(O1, IOi)

return 𝑚𝑒𝑟𝑔𝑒(MapList, occ)

MapList.add(𝑐𝑜𝑚𝑝𝑜𝑠𝑒(MO1,IOi, MIOi,O2))

MIOi,O2 getMapping(IOi, O2)

end for

MapList

(c,c ), (a,a)

(a,a), (b,b)

12

EVALUATION SETUP

• Match problem• Adult Mouse Anatomy (MA)

• NCI Thesaurus Anatomy part (NCIT)

Uberon

UMLSMA NCIT

RadLex

FMA

• Gold standard ~1500 correspondences

• Precompute mappings using a match strategy

~5000

~88,000

~30,800

~81,000

~2,700 ~3,300

#concepts

13

EVALUATION SETUP

• Match problem• Adult Mouse Anatomy (MA)

• NCI Thesaurus Anatomy part (NCIT)

PreprocessingNormalization

Linguistic Matcher(Name, synonyms, Trigram t = 0.8)

Selection & Postprocessing

Uberon

UMLSMA NCIT

RadLex

FMA

• Gold standard ~1500 correspondences

~5000

~88,000

~30,800

~81,000

~2,700 ~3,300

#concepts

14

• Direct match result compared to composeMatch via each IO

• Additional matching of unmatched parts (extendMatch)

RESULTS

88.2%

86%

• Uberon & UMLS → best evaluated intermediate ontologies

Intermediate Ontology IO

15

• Combination of four composed mappings

• Correspondences have to occur in at least 1, …, 4 mappings

RESULTS

union(occ=1)

F-Measure 90.2

Precision 92.7

Recall 87.8

Higher occurrence→ Recall ↓

extendMatch→ Recall ↑

16

• Combination of four composed mappings

• Correspondences have to occur in at least 1, …, 4 mappings

RESULTS

http://oaei.ontologymatching.org/[year]/anatomy

Top Results OAEI

Other systems later adopted similar techniques to make use of domain specific background knowledge (e.g. including Uberon, UMLS)

17

COMPOSITION VIA SEVERAL SOURCES

• Many “mapping path” alternatives…

GeoNames

LinkedGeoData

PubMed

Wrong domain

Hartung, Groß, Rahm: Composition Methods for Link Discovery. Proc. of 15. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2013

• Which intermediate source(s) should be used?

S T

A

B

C

S T

A

B

C

18

COMPOSITION VIA SEVERAL SOURCES

• Many “mapping path” alternatives…

GeoNames

LinkedGeoData

PubMedWorldFactBook

Too special

Hartung, Groß, Rahm: Composition Methods for Link Discovery. Proc. of 15. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2013

• Which intermediate source(s) should be used?

S T

A

B

C

S T

A

B

C

19

COMPOSITION VIA SEVERAL SOURCES

• Many “mapping path” alternatives…

GeoNames

LinkedGeoData

PubMedWorldFactBook

DBpedia

Ok, universal knowledge source

Hartung, Groß, Rahm: Composition Methods for Link Discovery. Proc. of 15. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2013

• Which intermediate source(s) should be used?

S T

A

B

C

S T

A

B

C

20

COMPOSE OPERATOR

21

EFFECTIVENESS OF MAPPINGS FOR COMPOSITION

Source S Target TIntermediate IMS,I MI,T

domain(MS,I) range(MS,I) domain(MI,T) range(MI,T)

Binary:

n-ary:

1. Mapping coverage in S and T should be high

2. Overlap of entities in I should be high

22

Mapping-based

• Take all mapping paths between S and T

• Different path filtering methods1) Effectiveness: k most effective mapping

paths (selEff)

2) Complement: k best complementing mapping paths w.r.t. S and T (selComp)

Link-based

• Select best routes in a graph of links between entities/concepts (not on “mapping level”)

• Graph-based approach• Transformation of S, T and mappings

in M into a weighted, directed graph

• Application of Shortest-Path algorithm to solve mapping composition problem

DIFFERENT COMPOSITION STRATEGIES

Hartung, Groß, Rahm: Composition Methods for Link Discovery. Proc. of 15. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2013

23

Reuse of mappings and composition strategies → very useful to create new correspondences/links

EVALUATION

60

70

80

90

100

NYT-DBp NYT-FB NYT-GeoN MA-NCIT

F-m

easu

re

all selEff selCompl link

• selEff, selComp, link always better than naïve (all) approach

Geography(Instance Matching track)

Anatomytrack

• Selection strategies better for Anatomy

• Link strategy slightly better for Geography

+ Best Compose approach always better than direct match

24

REUSE EXISTING MAPPINGS TO …

→ create new ontology mappings• “Indirect” matching: combine existing mappings to create

new mappings between so far unconnected sources

→ create up-to-date ontology mappings• Migration of outdated mappings to currently valid

ontology versions

Ontologies, ontology mappings, ontology evolution

Composition-based ontology matching

2) Adaptation of ontology mappings3) Outlook

25

𝑶𝟏′

𝑶𝟐′

𝑶𝟏

𝑶𝟐

𝑂𝑀𝑂1,𝑂2 𝑂𝑀𝑂1′,𝑂2′ ?

Requirements• High mapping quality

• Mapping consistency

• Include new concepts

• Reduction of manual effort, involve user feedback

• Support of semantic mappings

• Mappings can become invalid → need to be updated

• Reuse existing mappings (avoid full re-determination)

MAPPING ADAPTATION PROBLEM

Groß: Evolution von ontologiebasierten Mappings in den Lebenswissenschaften, Dissertation, Universität Leipzig, 2014.

Groß, Dos Reis, Hartung, Pruski, Rahm: Semi-automatic adaptation of mappings between life science ontologies. Proc. 9th Intl. Conference on Data Integration in the Life Sciences (DILS), 2013.

26

ADAPTATION APPROACHES

𝑶𝑴𝑶𝟏, 𝑶𝟏′

𝑶𝟏

𝑶𝟐

𝑂𝑀𝑂1,𝑂2

compose

𝒅𝒊𝒇𝒇𝑶𝟏, 𝑶𝟏′𝑶𝟏

𝑶𝟐

DiffAdapt

𝑂𝑀𝑂1,𝑂2

Composition-basedAdaptation (CA)

Diff-basedAdaptation (DA)

𝑶𝟏’𝑶𝟏’

27

ADAPTATION APPROACHES

𝑶𝑴𝑶𝟏, 𝑶𝟏′

𝑶𝑴𝑶𝟐,𝑶𝟐′

𝑶𝟏

𝑶𝟐

𝑂𝑀𝑂1,𝑂2

𝒅𝒊𝒇𝒇𝑶𝟏, 𝑶𝟏′

𝒅𝒊𝒇𝒇𝑶𝟐,𝑶𝟐′

𝑶𝟏

𝑶𝟐

𝑂𝑀𝑂1,𝑂2

Composition-basedAdaptation (CA)

Diff-basedAdaptation (DA)

𝑶𝟏’

𝑶𝟐’𝑶𝟐’

𝑶𝟏’

28

ADAPTATION APPROACHES

compose

𝑶𝑴𝑶𝟏, 𝑶𝟏′

𝑶𝑴𝑶𝟐,𝑶𝟐′

𝑶𝟏

𝑶𝟐

𝑂𝑀𝑂1,𝑂2 𝑶𝑴𝑶𝟏′ ,𝑶𝟐′

𝒅𝒊𝒇𝒇𝑶𝟏, 𝑶𝟏′

𝒅𝒊𝒇𝒇𝑶𝟐,𝑶𝟐′

𝑶𝟏

𝑶𝟐

𝑶𝑴𝑶𝟏′ ,𝑶𝟐′

DiffAdapt

𝑂𝑀𝑂1,𝑂2

Composition-basedAdaptation (CA)

Diff-basedAdaptation (DA)

𝑶𝟏’

𝑶𝟐’𝑶𝟐’

𝑶𝟏’

29

• COnto-Diff: Diff Evolution Mapping 𝑑𝑖𝑓𝑓(𝑂𝑜𝑙𝑑 , 𝑂𝑛𝑒𝑤)

• Based on match mapping between two ontology versions 𝑂𝑜𝑙𝑑 and 𝑂𝑛𝑒𝑤

• Set of basic and complex change operations

addC, addR, …

delC, delR, toObsolete, …

split, merge, substitute, …

• GENERIC ONTOLOGY MATCHING AND MAPPING MANAGEMENT

• Generic infrastructure to manage and analyze evolution of ontologies and mappings

GOMMA

30

• Combine ‘old‘ ontology mapping with ontology evolution mapping (between old and new version): compose-Operator

• Reuse and adapt existing correspondences

COMPOSITION-BASED ADAPTATION

• Semantic correspondence types?

+ Matching added concepts (𝑂1’\𝑂1, 𝑂2’ \𝑂2)

tail

headneck

limbslower extremities limb segments

limbs

upper extremities

body

neck

body

𝑶𝟏 𝑶𝟐

trunk

limbs

head and neck

body

𝑶𝟐‘

lower limbsupper limbs

==

=

===

=

<<

>

>

<<

tail

head

𝑶𝑴𝑶𝟏,𝑶𝟐 𝑶𝑴𝑶𝟐,𝑶𝟐′

trunk

semType:

= equivalent

< less general

> more general

31

𝑶𝑴𝑶𝟏,𝑶𝟐′

• Combine ‘old‘ ontology mapping with ontology evolution mapping (between old and new version): compose-Operator

• Reuse and adapt existing correspondences

COMPOSITION-BASED ADAPTATION

• Semantic correspondence types?

+ Matching added concepts (𝑂1’\𝑂1, 𝑂2’ \𝑂2)

lower extremitieslimbs

upper extremities

body

neck

𝑶𝟏

trunk

limbs

head and neck

body

𝑶𝟐‘

lower limbsupper limbs

tail

head

trunk

semType:

= equivalent

< less general

> more general

?

32

<<neck head and neckhead

compose

ℎ𝑎𝑛𝑑𝑙𝑒𝑑

headneckneck

head and neckhead

𝑶𝟏 𝑶𝟐 𝑶𝟐‘

COMBINATION OF SEMANTIC CORRESPONDENCES

• Correspondence (𝑐1, 𝑐2), 𝑐1 ∈ 𝑂1, 𝑐2 ∈ 𝑂2

• 𝑠𝑒𝑚𝑇𝑦𝑝𝑒 ∈ =,<,>, ≈

• 𝑠𝑡𝑎𝑡𝑢𝑠 ∈ ℎ𝑎𝑛𝑑𝑙𝑒𝑑, 𝑡𝑜𝑉𝑒𝑟𝑖𝑓𝑦

= < > ≈

= = < > ≈

< < < ≈ ≈

> > ≈ > ≈

≈ ≈ ≈ ≈ ≈

semType1

semType2

==

<<

semType1 semType2

• Semantic type: ≈• Status: 𝑡𝑜𝑉𝑒𝑟𝑖𝑓𝑦• compose → 4 correspondences

lower extremitieslimb segments

upper extremities

lower limbs

upper limbs

>>

<<

𝑶𝟏 𝑶𝟐 𝑶𝟐‘

33

• Modular, flexible adaptation approach

• Individual migration for different change operations using Change Handler 𝐶𝐻

• Reuse and adaptation of existing correspondences

DIFF-BASED ADAPTATION OF ONTOLOGY MAPPINGS

34

DIFF-BASED ADAPTATION OF ONTOLOGY MAPPINGS

tail

headneck

limbslower extremities limb segments

limbs

upper extremities

body

neck

body

𝑶𝟏 𝑶𝟐

trunk

limbs

head and neck

body

𝑶𝟐‘

lower limbsupper limbs

trunk

=

>

=

=

===

=

<<

>

<<

tail

head

𝑶𝑴𝑶𝟏,𝑶𝟐 𝑶𝑴𝑶𝟐,𝑶𝟐′

35

DIFF-BASED ADAPTATION OF ONTOLOGY MAPPINGS

tail

headneck

limbslower extremities limb segments

limbs

upper extremities

body

neck

body

𝑶𝟏 𝑶𝟐

trunk

limbs

head and neck

body

𝑶𝟐‘

lower limbsupper limbs

trunk

=

>

=

=

===

=

<<

>

<<

tail

head

𝑶𝑴𝑶𝟏,𝑶𝟐 𝑶𝑴𝑶𝟐,𝑶𝟐′

merge({head, neck}, head and neck)

addC(trunk)delC(tail)

𝒅𝒊𝒇𝒇𝑶𝟐,𝑶𝟐′

split (limb segments, {lower limbs, upper limbs})

36

DIFF-BASED ADAPTATION OF ONTOLOGY MAPPINGS

DiffAdapt 𝑶𝑴𝑶𝟐,𝑶𝟏, 𝒅𝒊𝒇𝒇𝑶𝟐,𝑶𝟐′ , 𝑶𝟐,𝑶𝟐′, 𝑶𝟏, 𝑪𝑯

1. Determination of affected correspondences 𝑶𝑴𝒊𝒏𝒇𝒍 using 𝒅𝒊𝒇𝒇𝑶𝟐,𝑶𝟐′

2. Reuse of unaffected mapping part: 𝑂𝑀𝑂2′,𝑂1← 𝑂𝑀𝑂2,𝑂1\ 𝑂𝑀𝑖𝑛𝑓𝑙

3. For each 𝑐ℎ ∈ 𝐶𝐻

• Adaptation of 𝑂𝑀𝑖𝑛𝑓𝑙 using a change hander strategy (𝒅𝒊𝒇𝒇𝑶𝟐,𝑶𝟐′, 𝑶𝟐,𝑶𝟐′, 𝑶𝟏)

4. Union of 𝑂𝑀𝑖𝑛𝑓𝑙 with unaffected mapping part:

𝑂𝑀𝑂2′,𝑂1← 𝑂𝑀𝑂2′,𝑂1 ∪ 𝑂𝑀𝑖𝑛𝑓𝑙

tail

headneck

limbslower extremities limb segments

limbs

upper extremities

body

neck

body

𝑶𝟏 𝑶𝟐

trunk

limbs

head and neck

body

𝑶𝟐‘

lower limbsupper limbs

trunk

=

>

=

=

===

=

<<

>

<<

tail

head

𝑶𝑴𝑶𝟏,𝑶𝟐 𝑶𝑴𝑶𝟐,𝑶𝟐′

𝑶𝑴𝒊𝒏𝒇𝒍

Unaffected

37

𝑚𝑒𝑟𝑔𝑒 𝒉𝒆𝒂𝒅, 𝑛𝑒𝑐𝑘 , 𝒉𝒆𝒂𝒅 𝒂𝒏𝒅 𝒏𝒆𝒄𝒌

EXAMPLES

MergeHandler

= <neckneckhead and neck= headhead

𝑶𝟏 𝑶𝟐 𝑶𝟐‘

upper extremities <lower limbsupper limbs

lower extremitieslimb segments

<

𝑠𝑝𝑙𝑖𝑡(𝒍𝒊𝒎𝒃 𝒔𝒆𝒈𝒎𝒆𝒏𝒕𝒔, {𝒍𝒐𝒘𝒆𝒓 𝒍𝒊𝒎𝒃𝒔, 𝒖𝒑𝒑𝒆𝒓 𝒍𝒊𝒎𝒃𝒔})

SplitHandler - “take best”

≈lower extremities lower limbs 𝒕𝒐𝑽𝒆𝒓𝒊𝒇𝒚

upper extremities upper limbs≈ 𝒕𝒐𝑽𝒆𝒓𝒊𝒇𝒚

< head and neckhead 𝒉𝒂𝒏𝒅𝒍𝒆𝒅

<neck 𝒉𝒂𝒏𝒅𝒍𝒆𝒅head and neck

<

>>

𝑚𝑒𝑟𝑔𝑒({ℎ𝑒𝑎𝑑, 𝒏𝒆𝒄𝒌}, 𝒉𝒆𝒂𝒅 𝒂𝒏𝒅 𝒏𝒆𝒄𝒌)

38

• UMLS Mapping versions: „silver standard“

• Adaptation of 2009 version, reference mapping: 2012 version

EVALUATION

Ontology size Mapping size

1

10

100

1.000

10.000

100.000

# c

han

ges

NCIT SCT

FMA NCIT SCT

#Concepts2009 62,285 63,655 310,121

#Concepts2012 62,285 84,132 318,502

SCT-NCIT

#Corr2009 19,971

#Corr2012 22,732

• merge, split, …

• Many concept additions and toObsolete changes

• Mapping changes

• 8% delCorr

• 19% addCorr

Ontology changes

39

70

75

80

85

90

95

100

Unaff CA CA+m DA DA+m

MAPPING QUALITY SCT-NCIT

• Unaffected correspondences only (Unaff ): good results

• CA: Precision ↓

• CA+m: Recall ↑ , F-Measure ≈ 90%

• Diff-based approaches: increased quality, especially Precision ↑

• DA+m: best quality, F-Measure ≈ 94%

RecallUnaff

F-MeasureUnaff

Precision Recall F-Measure

Composition Diff

40

Adaptation Strategy

1) Automatic detection of consistent mappings w.r.t. new ontology version

2) Recommendations for new correspondences→ Aim: complete mapping

3) Expert validation of correspondence (𝑡𝑜𝑉𝑒𝑟𝑖𝑓𝑦 status)

SEMI-AUTOMATIC MAPPING ADAPTATION

High mapping quality Consistent mappingNew correspondences for new concepts Reduction of manual effort Consider mapping semantics

41

• Ontology matching and entity linking• Integration of larger sets of heterogeneous sources:

holistic matching and reuse of clustered entities

• Semantic enrichment with concepts of ontologies

• Interactive tools for link verification

• Mapping semantics• Use of semantic relationships (is-a, part-of, …) in

mappings and Diff

• Evolution and adaptation of ontology-based annotations

OUTLOOK