EvaluatingOntology Alignment
Techniques
Willem Robert van HageVU University Amsterdam
what did I study?
• text mining techniques to find BT(subclass and part-whole) relations in text
• using respectively NAL, FAO; and FDA, EPA, and WHO data
• various sample-based evaluation techniques
• end-to-end application evaluation versus stratified sampling
• the quality of current state of the art thesaurus alignment techniques
• together with NAL, FAO, and EEA
• at NKOS 2008 Lori Finch talked about our work on comparative evaluation tasks at the OAEI 2006/2007
OAEI food & environment tasks: http://www.few.vu.nl/~wrvhage/oaei2007/PhD thesis: http://www.few.vu.nl/~wrvhage/papers/wrvh_thesis_20080724.pdf
AGRIS document
Titles Free amino acids in the roots of finger-millet plants infected with ring nematodes
AGROVOC
NALT
plant parasitic nematodes
exactMatch
"cyst nematodes" OR "ring nematodes"
"phytonematodes"
NALT
cyst nematodes
narrower
NALT
ring nematodes
narrower
phytonematodes
plant nematodes
exploitcombinedknowledge
retrieve more documents in FAO’s AGRIS using narrower terms from NALT
some numbers
• OAEI 2007 food & environment tasks(fully automatic)
• mostly but not only skos:exactMatch
• sample evaluation ±1650 mappings
AGROVOC
28445 descr.
12531 non-.
GEMET
5398 descr.
NALT
42326 descr.
25984 non-.
4106 exactMatch
37310 exactMatch
2328 broadMatch
3710 narrowMatch
exactMatch 4984
0
7,500
15,000
22,500
30,000
Falcon-AO RiMOM Prior COMA++ HMatch
20,001
15,496
11,51113,97513,009
0
0.25
0.50
0.75
1.00
Falcon-AO RiMOM Prior COMA++ HMatch
0.65
0.33
0.640.71
0.65
Falcon-AO RiMOM Prior COMA++ HMatch
0.610.54
0.710.810.83
OAEI 2006food task
Precision
RecallexactMatch broadMatch & narrowMatch disjoint
15,49613,009 13,975
11,511
20,001
60%32%
8%
biological & chemicalmiscellaneous (geographical, legislation, food stuffs, etc.)taxonomical
1.00
0.75
0.50
0.25
0
overall precision (100%)
0
0.25
0.50
0.75
1.00
Falcon-AO DSSim X-SOM SCARLET RiMOM
0.620.60
0.450.49
0.83
good bad
only exactMatch (81)
OAEI 2007food task
60%
26%
10%
3%
geographicalbiological & chemicalmiscellaneous (farming systems, ecology, etc.)taxonomical (animals, plants, etc.)
topics in the results NALT-AGROVOC
number of resultsexactMatch broadMatch & narrowMatch disjoint
0
5,000
10,000
15,000
20,000
Falcon-AO DSSim X-SOM SCARLET RiMOM
6,038
18,420
81
6,583
14,96215,30081 exact
6,038 b & n647 disjoint
15,300 14,962
6,583
18,420
Precision
Recall
OAEI 2007environment task
topics in the results GEMET-AGROVOC
15%
10%
12%
46%
13%
4%geographicalbiological & chemicalmiscellaneoustaxonomicalnatural resourcesfood safety
15%
9%
16%
40%
17%
3%geographicalbiological & chemicalmiscellaneoustaxonomicalnatural resourcesfood safety
topics in the results GEMET-NALT
number of results
0
1,125
2,250
3,375
4,500
Falcon-AO DSSim
3,030
1,384
exactMatch broadMatch & narrowMatch disjoint
0
1,125
2,250
3,375
4,500
Falcon-AO DSSim
4,278
1,374
GEMET-AGROVOC GEMET-NALT
overall precision (100%, 100%)
GEMET-AGROVOC GEMET-NALT
0
0.25
0.50
0.75
1.00
Falcon-AO DSSim
0.670.12
0.33
0.88
0
0.25
0.50
0.75
1.00
Falcon-AO DSSim
0.560.14
0.44
0.86
good bad
recall of only exactMatch
GEMET-AGROVOC GEMET-NALT
0
0.25
0.50
0.75
1.00
Falcon-AO DSSim
0.760.40
0.24
0.60
0
0.25
0.50
0.75
1.00
Falcon-AO DSSim
0.720.50
0.28
0.50
good badRecall
Precision
conclusions• results improved significantly, especially in Recall, but
interesting matches are still missing
• system design lessons learnt:
• systems should first find the easy matches and then carefully extend to harder matches
• systems should only try to find more matches when they do not already have a good match
• systems should attempt to learn which lexical patterns hold in parts of the thesauri to distinguish “Bos taurus” < “Bos” from “lime stone” < “stone”
• systems should attempt to exploit background knowledge, alignment is really “AI-hard”
EvaluatingOntology Alignment
Techniques
Willem Robert van HageVU University Amsterdam
EvaluatingOntology Alignment
Techniqueswhy bother?
Willem Robert van HageVU University Amsterdam
two approaches
• If you want to do information integration and you need to combine vocabularies you can do:
• ontology merging
• start with two ontologies, end with one
• merge some concepts, copy others, perhaps delete some
• ontology alignment
• start with two ontologies, end with three
• add relations between concepts, sometimes add intermediate concepts
• two ontologies stay unchanged
merging
alignment
parrots
animals
birds
animal
kingdomexactMatch
broadMatch
parrots
birds
animals
parrots
animals
birds
animal
kingdom
merging or alignment?
merging or alignment?
alignment.
why not merge?
why not merge?
+
why not merge?
+
= N A LDO
D
G E P A NI S
just kidding...
merging or alignment?alignment.
merging or alignment?alignment.
• alignment gives you more freedom to manage the combined resources in the future
merging or alignment?alignment.
• alignment gives you more freedom to manage the combined resources in the future
• three important properties of ontology alignment:
merging or alignment?alignment.
• alignment gives you more freedom to manage the combined resources in the future
• three important properties of ontology alignment:
1. the alignment itself is a separate collection
merging or alignment?alignment.
• alignment gives you more freedom to manage the combined resources in the future
• three important properties of ontology alignment:
1. the alignment itself is a separate collection
2. alignment relations allow for subtle differences to be pointed out, but not removed
merging or alignment?alignment.
• alignment gives you more freedom to manage the combined resources in the future
• three important properties of ontology alignment:
1. the alignment itself is a separate collection
2. alignment relations allow for subtle differences to be pointed out, but not removed
3. the original thesauri can keep their own separate liveswhile applications can make combined use of them
issues with merging
issues with merging• legal issues
• who owns the result?
• what about ownership of past and future versions?
issues with merging• legal issues
• who owns the result?
• what about ownership of past and future versions?
• maintenance issues
• who is allowed to change the resulting thesaurus?
• who will pay for future modifications?
issues with merging• legal issues
• who owns the result?
• what about ownership of past and future versions?
• maintenance issues
• who is allowed to change the resulting thesaurus?
• who will pay for future modifications?
• security issues
• future changes might reveal confidential plans to parties
issues with merging• legal issues
• who owns the result?
• what about ownership of past and future versions?
• maintenance issues
• who is allowed to change the resulting thesaurus?
• who will pay for future modifications?
• security issues
• future changes might reveal confidential plans to parties
• legacy issues
• software and internal policies will have to be adapted to deal with the new “world view”
different points of view
• alignment allows different points of view to coexist
• is that good or bad? – it’s better than bad, it’s good!
• you can always ignore the other perspectives,while you can benefit from them whenever you like“you never lose”
• sometimes it is very interesting to see where the meaning of concepts clash
• it is definitely good in cases where merging is politically impossible or cooperation is hard to organize
• on the web this is very common
dealing with differences
• within a thesaurus mixed points of view should be avoided, but when you cooperate they are unavoidable
• you have to deal with them one way or another
• you can sit together, work out who’s wrongand update the ontologies
• you can ignore the problem and not link to each other
• you can describe the differences and decide how to deal with them whenever it becomes relevant
rdfs:subPropertyOf, skos:closeMatch, skos:broadMatch, etc. (as opposed to owl:sameAs or owl:equivalentClass)are your friends
AGROVOC NALT
Ireland
Northern Ireland
United Kingdom
Ireland
British Isles
broader broader
British Isles
broader
broader
exactMatch
exactMatch
exactMatch
exactMatch
United Kingdom
broader
Irish Republic
broader
Northern Ireland
broader
narrowMatch
broadMatchbroader
examplelet’s think about the consequences
Andorra
Andorra
Western European
region
Western Europe
Europenamed
geographical regions
related
broaderbroader
Europe
broader
broader
exactMatch
exactMatch
exactMatch
narrowMatch
AGROVOC NALT
examplelet’s think about the consequences
a final remarkabout power and the web
• in the past you gained the most power by constraining access to your information
• now you can also gain power by having people use your information and extend it for you
• sharing makes you a de facto authority:people use whatever works and is available
• sharing makes others do part of your work for you:when other people openly link their information to yours you can also make use of the link
• consider benefitting from publishing linked databy making it or by aligning with it
Linked Data: http://linkeddata.org