Post on 31-Dec-2015
description
transcript
© Paul Buitelaar – EON@ISWC07, November 2007, Busan, South-Korea
Evaluating Ontology Search
Towards Benchmarking in Ontology Search
Paul Buitelaar, Thomas EignerCompetence Center Semantic Web &
Language Technology Lab DFKI GmbH
Saarbrücken, Germany
© Paul Buitelaar – EON@ISWC07, November 2007, Busan, South-Korea
Overview
Ontology Search Knowledge reuse (integration with Ontology Learning)
OntoSelect Browse (ontologies, labels, classes, properties) Search by topic
Evaluating Ontology Search Benchmark (evaluation) data set Experiment (compare SWOOGLE, OntoSelect)
Conclusions
© Paul Buitelaar – EON@ISWC07, November 2007, Busan, South-Korea
Ontology Search There are more and more ontologies published on the
(Semantic) Web Available as RDFS or OWL files (also still DAML)
Opens up possibilities for reuse of knowledge Access through ontology search engines and/or
(manual/automatic) organization in ontology libraries
But: increasingly harder to find the right one for your application Increasing research in ontology search/selection (Alani et al.,
Buitelaar et al., Ding et al., Sabou et al.) – SWOOGLE, OntoSelect, Watson
© Paul Buitelaar – EON@ISWC07, November 2007, Busan, South-Korea
OntoSelect Ontology Library and Search Engine
http://olp.dfki.de/OntoSelect
Monitors the web for ontologies with automatic harvesting and indexing
Browse and search On ontologies, classes, properties and (multilingual) labels Ontology search integrates relevance feedback over
Wikipedia for search term Ontology publishing
Submit ontologies - will be automatically integrated Statistics
On formats, languages, labels used, ontology publishing
Paul Buitelaar, Thomas Eigner, Thierry Declerck OntoSelect: A Dynamic Ontology Library with Support for Ontology Selection In: Proc. of the Demo Session at the International Semantic Web Conference, Hiroshima, Japan, Nov. 2004.
© Paul Buitelaar – EON@ISWC07, November 2007, Busan, South-Korea
OntoSelect – Browse
© Paul Buitelaar – EON@ISWC07, November 2007, Busan, South-Korea
Ontology Search
© Paul Buitelaar – EON@ISWC07, November 2007, Busan, South-Korea
Keyword as Wikipedia Topic
© Paul Buitelaar – EON@ISWC07, November 2007, Busan, South-Korea
Keyword Expansion (Extraction)
Relevance Feedback from Wikipedia
© Paul Buitelaar – EON@ISWC07, November 2007, Busan, South-Korea
Ranked Results (Browsable)
© Paul Buitelaar – EON@ISWC07, November 2007, Busan, South-Korea
Search Criteria Relevance criteria address ontology content, structure, status:
Coverage - Term Matching How many of the terms in a text collection are covered by labels for
classes and properties?
Structure - Properties Relative to Classes How detailed is the knowledge structure that the ontology
represents?
Connectedness - Number of Included Ontologies Is the ontology connected to other ontologies and how well
established are these?
© Paul Buitelaar – EON@ISWC07, November 2007, Busan, South-Korea
Evaluation – Benchmark Benchmark: 15 Wikipedia topics and 57 manually assigned
ontologies out of 1056 cached through OntoSelect 15 Wikipedia topics were selected out of the set of all (37284)
class/property labels in OntoSelect, by: Filtering out labels that did not correspond to a Wikipedia page >
5658 labels / topics 5658 labels were used as search terms in SWOOGLE to filter out
labels that returned less than 10 ontologies (out of the 1056 in OntoSelect) > 3084 labels / topics
Out of 3084 labels we manually selected useful topics, e.g. we left out very short labels (‘v’) and very abstract ones (‘thing’) > 50 topics
We randomly selected 15 for which we manually checked the ontologies retrieved from OntoSelect and SWOOGLE > 15 topics with 57 assigned ontologies
© Paul Buitelaar – EON@ISWC07, November 2007, Busan, South-Korea
Evaluation – Benchmark by Topic 15 (Wikipedia) topics with number of assigned ontologies:
Atmosphere (2) Biology (11) City (3)
http://www.mindswap.org/2003/owl/geo/geoFeatures.owl http://www.glue.umd.edu/ katyn/CMSC828y/location.daml http://www.daml.org/2001/02/geofile/geofile-ont
Communication (10) Economy (1) Infrastructure (2) Institution (1) Math (3) Military (5) Newspaper (2) Oil (0) Production (1) Publication (6) Railroad (1) Tourism (9)
© Paul Buitelaar – EON@ISWC07, November 2007, Busan, South-Korea
Evaluation – Experiment Comparison of (average) results between
SWOOGLE and OntoSelect
Use OntoSelect benchmark 15 topics (queries) 57 assigned ontologies (relevance assessments) 1056 ontologies (data set)
Use different configurations for OntoSelect With/without keyword expansion/extraction With/without class names (in addition to labels) With/without property labels Weighting of relevance criteria …
© Paul Buitelaar – EON@ISWC07, November 2007, Busan, South-Korea
Evaluation – Results
© Paul Buitelaar – EON@ISWC07, November 2007, Busan, South-Korea
Evaluation – Weighting of ‚title‘
© Paul Buitelaar – EON@ISWC07, November 2007, Busan, South-Korea
Conclusions Conclusions on evaluation are too early
Many more configurations (weights) to compare Extend the benchmark Comparison with other ontology search engines
Main contribution of the presented work First comprehensive benchmark for topic-driven
evaluation of ontology search (Extended) Benchmark will be made publicly available
http://olp.dfki.de/OntoSelect