+ All Categories
Home > Documents > Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find...

Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find...

Date post: 03-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
34
Evaluating Ontology Alignment Techniques Willem Robert van Hage VU University Amsterdam
Transcript
Page 1: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

EvaluatingOntology Alignment

Techniques

Willem Robert van HageVU University Amsterdam

Page 2: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

what did I study?

• text mining techniques to find BT(subclass and part-whole) relations in text

• using respectively NAL, FAO; and FDA, EPA, and WHO data

• various sample-based evaluation techniques

• end-to-end application evaluation versus stratified sampling

• the quality of current state of the art thesaurus alignment techniques

• together with NAL, FAO, and EEA

• at NKOS 2008 Lori Finch talked about our work on comparative evaluation tasks at the OAEI 2006/2007

OAEI food & environment tasks: http://www.few.vu.nl/~wrvhage/oaei2007/PhD thesis: http://www.few.vu.nl/~wrvhage/papers/wrvh_thesis_20080724.pdf

Page 3: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

AGRIS document

Titles Free amino acids in the roots of finger-millet plants infected with ring nematodes

AGROVOC

NALT

plant parasitic nematodes

exactMatch

"cyst nematodes" OR "ring nematodes"

"phytonematodes"

NALT

cyst nematodes

narrower

NALT

ring nematodes

narrower

phytonematodes

plant nematodes

exploitcombinedknowledge

retrieve more documents in FAO’s AGRIS using narrower terms from NALT

Page 4: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

some numbers

• OAEI 2007 food & environment tasks(fully automatic)

• mostly but not only skos:exactMatch

• sample evaluation ±1650 mappings

AGROVOC

28445 descr.

12531 non-.

GEMET

5398 descr.

NALT

42326 descr.

25984 non-.

4106 exactMatch

37310 exactMatch

2328 broadMatch

3710 narrowMatch

exactMatch 4984

Page 5: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

0

7,500

15,000

22,500

30,000

Falcon-AO RiMOM Prior COMA++ HMatch

20,001

15,496

11,51113,97513,009

0

0.25

0.50

0.75

1.00

Falcon-AO RiMOM Prior COMA++ HMatch

0.65

0.33

0.640.71

0.65

Falcon-AO RiMOM Prior COMA++ HMatch

0.610.54

0.710.810.83

OAEI 2006food task

Precision

RecallexactMatch broadMatch & narrowMatch disjoint

15,49613,009 13,975

11,511

20,001

60%32%

8%

biological & chemicalmiscellaneous (geographical, legislation, food stuffs, etc.)taxonomical

1.00

0.75

0.50

0.25

0

Page 6: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

overall precision (100%)

0

0.25

0.50

0.75

1.00

Falcon-AO DSSim X-SOM SCARLET RiMOM

0.620.60

0.450.49

0.83

good bad

only exactMatch (81)

OAEI 2007food task

60%

26%

10%

3%

geographicalbiological & chemicalmiscellaneous (farming systems, ecology, etc.)taxonomical (animals, plants, etc.)

topics in the results NALT-AGROVOC

number of resultsexactMatch broadMatch & narrowMatch disjoint

0

5,000

10,000

15,000

20,000

Falcon-AO DSSim X-SOM SCARLET RiMOM

6,038

18,420

81

6,583

14,96215,30081 exact

6,038 b & n647 disjoint

15,300 14,962

6,583

18,420

Precision

Recall

Page 7: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

OAEI 2007environment task

topics in the results GEMET-AGROVOC

15%

10%

12%

46%

13%

4%geographicalbiological & chemicalmiscellaneoustaxonomicalnatural resourcesfood safety

15%

9%

16%

40%

17%

3%geographicalbiological & chemicalmiscellaneoustaxonomicalnatural resourcesfood safety

topics in the results GEMET-NALT

number of results

0

1,125

2,250

3,375

4,500

Falcon-AO DSSim

3,030

1,384

exactMatch broadMatch & narrowMatch disjoint

0

1,125

2,250

3,375

4,500

Falcon-AO DSSim

4,278

1,374

GEMET-AGROVOC GEMET-NALT

overall precision (100%, 100%)

GEMET-AGROVOC GEMET-NALT

0

0.25

0.50

0.75

1.00

Falcon-AO DSSim

0.670.12

0.33

0.88

0

0.25

0.50

0.75

1.00

Falcon-AO DSSim

0.560.14

0.44

0.86

good bad

recall of only exactMatch

GEMET-AGROVOC GEMET-NALT

0

0.25

0.50

0.75

1.00

Falcon-AO DSSim

0.760.40

0.24

0.60

0

0.25

0.50

0.75

1.00

Falcon-AO DSSim

0.720.50

0.28

0.50

good badRecall

Precision

Page 8: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

conclusions• results improved significantly, especially in Recall, but

interesting matches are still missing

• system design lessons learnt:

• systems should first find the easy matches and then carefully extend to harder matches

• systems should only try to find more matches when they do not already have a good match

• systems should attempt to learn which lexical patterns hold in parts of the thesauri to distinguish “Bos taurus” < “Bos” from “lime stone” < “stone”

• systems should attempt to exploit background knowledge, alignment is really “AI-hard”

Page 9: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

EvaluatingOntology Alignment

Techniques

Willem Robert van HageVU University Amsterdam

Page 10: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

EvaluatingOntology Alignment

Techniqueswhy bother?

Willem Robert van HageVU University Amsterdam

Page 11: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

two approaches

• If you want to do information integration and you need to combine vocabularies you can do:

• ontology merging

• start with two ontologies, end with one

• merge some concepts, copy others, perhaps delete some

• ontology alignment

• start with two ontologies, end with three

• add relations between concepts, sometimes add intermediate concepts

• two ontologies stay unchanged

Page 12: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

merging

alignment

parrots

animals

birds

animal

kingdomexactMatch

broadMatch

parrots

birds

animals

parrots

animals

birds

animal

kingdom

Page 13: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

merging or alignment?

Page 14: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

merging or alignment?

alignment.

Page 15: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

why not merge?

Page 16: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

why not merge?

+

Page 17: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

why not merge?

+

= N A LDO

D

G E P A NI S

just kidding...

Page 18: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

merging or alignment?alignment.

Page 19: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

merging or alignment?alignment.

• alignment gives you more freedom to manage the combined resources in the future

Page 20: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

merging or alignment?alignment.

• alignment gives you more freedom to manage the combined resources in the future

• three important properties of ontology alignment:

Page 21: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

merging or alignment?alignment.

• alignment gives you more freedom to manage the combined resources in the future

• three important properties of ontology alignment:

1. the alignment itself is a separate collection

Page 22: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

merging or alignment?alignment.

• alignment gives you more freedom to manage the combined resources in the future

• three important properties of ontology alignment:

1. the alignment itself is a separate collection

2. alignment relations allow for subtle differences to be pointed out, but not removed

Page 23: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

merging or alignment?alignment.

• alignment gives you more freedom to manage the combined resources in the future

• three important properties of ontology alignment:

1. the alignment itself is a separate collection

2. alignment relations allow for subtle differences to be pointed out, but not removed

3. the original thesauri can keep their own separate liveswhile applications can make combined use of them

Page 24: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

issues with merging

Page 25: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

issues with merging• legal issues

• who owns the result?

• what about ownership of past and future versions?

Page 26: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

issues with merging• legal issues

• who owns the result?

• what about ownership of past and future versions?

• maintenance issues

• who is allowed to change the resulting thesaurus?

• who will pay for future modifications?

Page 27: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

issues with merging• legal issues

• who owns the result?

• what about ownership of past and future versions?

• maintenance issues

• who is allowed to change the resulting thesaurus?

• who will pay for future modifications?

• security issues

• future changes might reveal confidential plans to parties

Page 28: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

issues with merging• legal issues

• who owns the result?

• what about ownership of past and future versions?

• maintenance issues

• who is allowed to change the resulting thesaurus?

• who will pay for future modifications?

• security issues

• future changes might reveal confidential plans to parties

• legacy issues

• software and internal policies will have to be adapted to deal with the new “world view”

Page 29: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

different points of view

• alignment allows different points of view to coexist

• is that good or bad? – it’s better than bad, it’s good!

• you can always ignore the other perspectives,while you can benefit from them whenever you like“you never lose”

• sometimes it is very interesting to see where the meaning of concepts clash

• it is definitely good in cases where merging is politically impossible or cooperation is hard to organize

• on the web this is very common

Page 30: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

dealing with differences

• within a thesaurus mixed points of view should be avoided, but when you cooperate they are unavoidable

• you have to deal with them one way or another

• you can sit together, work out who’s wrongand update the ontologies

• you can ignore the problem and not link to each other

• you can describe the differences and decide how to deal with them whenever it becomes relevant

rdfs:subPropertyOf, skos:closeMatch, skos:broadMatch, etc. (as opposed to owl:sameAs or owl:equivalentClass)are your friends

Page 31: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

AGROVOC NALT

Ireland

Northern Ireland

United Kingdom

Ireland

British Isles

broader broader

British Isles

broader

broader

exactMatch

exactMatch

exactMatch

exactMatch

United Kingdom

broader

Irish Republic

broader

Northern Ireland

broader

narrowMatch

broadMatchbroader

examplelet’s think about the consequences

Page 32: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

Andorra

Andorra

Western European

region

Western Europe

Europenamed

geographical regions

related

broaderbroader

Europe

broader

broader

exactMatch

exactMatch

exactMatch

narrowMatch

AGROVOC NALT

examplelet’s think about the consequences

Page 33: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

a final remarkabout power and the web

• in the past you gained the most power by constraining access to your information

• now you can also gain power by having people use your information and extend it for you

• sharing makes you a de facto authority:people use whatever works and is available

• sharing makes others do part of your work for you:when other people openly link their information to yours you can also make use of the link

• consider benefitting from publishing linked databy making it or by aligning with it

Linked Data: http://linkeddata.org

Page 34: Evaluating Ontology Alignment Techniques · what did I study? • text mining techniques to find BT (subclass and part-whole) relations in text • using respectively NAL, FAO; and

Recommended