+ All Categories
Home > Technology > Methodology and Campaign Design for the Evaluation of Semantic Search Tools

Methodology and Campaign Design for the Evaluation of Semantic Search Tools

Date post: 17-Dec-2014
Category:
Upload: stuart-wrigley
View: 2,060 times
Download: 2 times
Share this document with a friend
Description:
The main problem with the state of the art in the semantic search domain is the lack of comprehensive evaluations. There exist only a few efforts to evaluate semantic search tools and to compare the results with other evaluations of their kind. In this paper, we present a systematic approach for testing and benchmarking semantic search tools that was developed within the SEALS project. Unlike other semantic web evaluations our methodology tests search tools both automatically and interactively with a human user in the loop. This allows us to test not only functional performance measures, such as precision and recall, but also usability issues, such as ease of use and comprehensibility of the query language. The paper describes the evaluation goals and assumptions; the criteria and metrics; the type of experiments we will conduct as well as the datasets required to conduct the evaluation in the context of the SEALS initiative. To our knowledge it is the first effort to present a comprehensive evaluation methodology for Semantic Web search tools.
Popular Tags:
35
Methodology and Campaign Design for the Evaluation of Semantic Search Tools Stuart N. Wrigley 1 , Dorothee Reinhard 2 , Khadija Elbedweihy 1 , Abraham Bernstein 2 , Fabio Ciravegna 1 02.07.2022 1 1 University of Sheffield, UK 2 University of Zurich, Switzerland
Transcript
Page 1: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

Methodology and Campaign Design for the

Evaluation of Semantic Search Tools

Stuart N. Wrigley1, Dorothee Reinhard2, Khadija Elbedweihy1, Abraham Bernstein2,

Fabio Ciravegna1

10.04.2023

1

1University of Sheffield, UK2University of Zurich, Switzerland

Page 2: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

Outline

• SEALS initiative• Evaluation design

– Criteria– Two phase approach– API– Workflow

• Data• Results and Analyses• Conclusions

10.04.2023

2

Page 3: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

SEALS INITIATIVE

10.04.2023

3

Page 4: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

SEALS goals• Develop and diffuse best practices in evaluation of semantic technologies• Create a lasting reference infrastructure for semantic technology

evaluation– This infrastructure will be the SEALS Platform

• Organise two worldwide Evaluation Campaigns– One this summer– Next in late 2011 / early 2012

• Facilitate the continuous evaluation of semantic technologies• Allow easy access to both:

– evaluation results (for developers and researchers) – technology roadmaps (for non-technical adopters)

• Transfer all infrastructure to the community

10.04.2023

4

Page 5: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

Targeted technologies

Five different types of semantic technologies:• Ontology Engineering tools• Ontology Storage and Reasoning Systems• Ontology Matching tools• Semantic Web Service tools• Semantic Search tools

10.04.2023

5

Page 6: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

What’s our general approach?• Low overhead to the participant

– Automate as far as possible– We provide the compute– We initiate the actual evaluation run– We perform the analysis

• Provide infrastructure for more than simply running high profile evaluation campaigns– reuse existing evaluations for your personal testing– create new ones evaluations – store / publish / download test data sets

• Encourage participation in evaluation campaign definitions and design• Open Source (Apache 2.0)

10.04.2023

6

Page 7: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

SEALS Platform

10.04.2023

7

SEALS Service ManagerSEALS Service Manager

ResultRepository

Service

ResultRepository

Service

ToolRepository

Service

ToolRepository

Service

Test DataRepository

Service

Test DataRepository

Service

EvaluationRepository

Service

EvaluationRepository

Service

Runtime Evaluation

Service

Runtime Evaluation

Service

TechnologyDevelopers

Evaluation Organisers

TechnologyUsers

Page 8: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

SEARCH EVALUATION DESIGN

10.04.2023

8

Page 9: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

What do we want to do?

• Evaluate / benchmark semantic search tools– with respect to their semantic peers.

• Allow as wide a range of interface styles as possible

• Assess tools on basis of a number of criteria including usability

• Automate (part) of it

10.04.2023

9

Page 10: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

10.04.2023

10

Evaluation criteria

User-centred search methodologies will be evaluated according to the following criteria:• Query expressiveness• Usability (effectiveness, efficiency, satisfaction)• Scalability• Quality of documentation• Performance

• Is the style of interface suited to the type of query?• How complex can the queries be?• How easy is the tool to use?• How easy is it to formulate the queries?• How easy is it to work with the answers?• Ability to cope with a large ontology• Ability to query a large repository in a reasonable time• Ability to cope with a large amount of results returned• Is it easy to understand?• Is it well structured?

Resource consumption:• execution time (speed)• CPU load• memory required

Page 11: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

10.04.2023

11

Two phase approach

• Semantic search tools evaluation demands a user-in-the-loop phase– usability criterion

• Two phases:– User-in-the-loop– Automated

Page 12: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

10.04.2023

12

Evaluation criteria

Each phase will address a different subset of criteria.

• Automated evaluation: query expressiveness, scalability, performance, quality of documentation

• User-in-the-loop: usability, query expressiveness

Page 13: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

RUNNING THE EVALUATION

10.04.2023

13

Page 14: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

Automated evaluation

10.04.2023

14

• Tools uploaded to platform. Includes:– wrapper implementing API– supporting libraries

• Test data and questions stored on platform• Workflow specifies details of evaluation sequence• Evaluation executed offline in batch mode• Results stored on platform• Analyses performed and stored on platform

Search tool

API Runtime Evaluation

Service

SEALS Platform

Page 15: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

10.04.2023

15

User in the loop evaluation

• Performed at tool provider site• All materials provided

– Controller software– Instructions (leader and subjects)– Questionnaires

• Data downloaded from platform• Results uploaded to platform

Search tool

API

Controller

SEALSPlatform

Over the webTool provider machine

Page 16: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

API

• A range of information needs to be acquired from the tool in both phases

• In automated phase, the tool has to be executed and interrogated with no human assistance.

• Interface between the SEALS platform and the tool must be formalised

10.04.2023

16

Page 17: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

API – common

• Load ontology– success / failure informs the interoperability

• Determine result type– ranked list or set?

• Results ready?– used to determine execution time

• Get results– list of URIs– number of results to be determined by developer

10.04.2023

17

Page 18: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

API – user in the loop

• User query input complete?

– used to determine input time• Get user query

– String representation of user’s query– if NL interface, same as text inputted

• Get internal query

– String representation of the internal query– for use with…

10.04.2023

18

Page 19: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

API – automated

• Execute query

– mustn’t constrain tool type to particular format– tool provider given questions shortly before

evaluation is executed– tool provider converts those questions into some

form of ‘internal representation’ which can be serialised as a String

– serialised internal representation passed to this method

10.04.2023

19

Page 20: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

DATA

10.04.2023

20

Page 21: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

10.04.2023

21

Data set – user in the loop

• Mooney Natural Language Learning Data– used by previous semantic search evaluation– simple and well-known domain– using geography subset

• 9 classes• 11 datatype properties• 17 object properties and • 697 instances

– 877 questions already available

Page 22: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

Data set – automated

• EvoOnt– set of object-oriented software source code

ontologies– easy to create different ABox sizes given a TBox– 5 data set sizes: 1k, 10k, 100k, 1M, 10M triples– questions generated by software engineers

10.04.2023

22

Page 23: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

RESULTS AND ANALYSES

10.04.2023

23

Page 24: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

Questionnaires

3 questionnaires:• SUS questionnaire• Extended questionnaire

– similar to SUS in terms of type of question but more detailed

• Demographics questionnaire

10.04.2023

24

Page 25: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

10.04.2023

25

System Usability Scale (SUS) score• SUS is a Likert scale• 10-item questionnaire• Each question has 5 levels (strongly disagree to strongly agree)• SUS scores have a range of 0 to 100.• A score of around 60 and above is generally considered as an indicator of

good usability.

Page 26: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

10.04.2023

26

Demographics• Age• Gender• Profession• Number of years in education• Highest qualification• Number of years in employment• knowledge of informatics• knowledge of linguistics• knowledge of formal query languages• knowledge of English• …

Page 27: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

AutomatedResults• Execution success (OK / FAIL / PLATFORM ERROR)• Triples returned• Time to execute each query• CPU load, memory usage

Analyses• Ability to load ontology and query (interoperability)• Precision and Recall (search accuracy and query expressiveness)• Tool robustness: ratio of all benchmarks executed to number of failed

executions

10.04.2023

27

Page 28: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

10.04.2023

28

User in the loopResults (other than core results similar to automated phase)• Query captured by the tool• Underlying query (e.g., SPARQL)• Is answer in result set? (user may try a number of queries before being

successful)• time required to obtain answer• number of queries required to answer question

Analyses• Precision and Recall• Correlations between results and SUS scores, demographics, etc

Page 29: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

Dissemination

• Results browsable on the SEALS portal• Split into three areas:

– performance– usability– comparison between tools

10.04.2023

29

Page 30: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

CONCLUSIONS

10.04.2023

30

Page 31: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

Conclusions

• Methodology and design of a semantic search tool evaluation campaign

• Exists within the wider context of the SEALS initiative• First version

• feedback from participants and community will drive the design of the second campaign

• Emphasis on the user experience (for search)– Two phase approach

10.04.2023

31

Page 32: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

10.04.2023

32

THANK YOU

Page 33: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

Get involved!

• First Evaluation Campaign in all SEALS technology areas this Summer

• Get involved – your input and participation is crucial• Workshop planned for ISWC 2010 after campaign

• Find out more (and take part!) at:http://www.seals-project.eu or talk to me, or email me ([email protected])

10.04.2023

33

Page 34: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

Timeline

• May 2010: Registration opens• May-June 2010: Evaluation materials and

documentation are provided to participants• July 2010: Participants upload their tools• August 2010: Evaluation scenarios are executed• September 2010: Evaluation results are analysed• November 2010: Evaluation results are discussed at

ISWC 2010 workshop (tbc)

10.04.2023

34

Page 35: Methodology and Campaign Design for the Evaluation of Semantic Search Tools

Best paper award

SEALS is proud to be sponsoring the best paper award here at SemSearch2010

Congratulations to the winning authors!

10.04.2023

35


Recommended