+ All Categories
Home > Technology > Qomex2010

Qomex2010

Date post: 28-Aug-2014
Category:
Upload: marianne-laurent
View: 414 times
Download: 2 times
Share this document with a friend
Description:
Considering the Subjectivity to Rationalise Evaluation Approaches: The Example of Spoken Dialogue Systems
Popular Tags:
14
Considering the subjectivity to rationalise evaluation approaches The example of Spoken Dialogue Systems Marianne Laurent , Philippe Bretier (Orange Labs) Ioannis Kanellos (Telecom Bretagne) 23 June 2010, Qomex 2010, Trondheim, Norway
Transcript
Page 1: Qomex2010

Considering the subjectivity to rationalise evaluation approaches The example of Spoken Dialogue Systems

Marianne Laurent, Philippe Bretier (Orange Labs) Ioannis Kanellos (Telecom Bretagne)23 June 2010, Qomex 2010, Trondheim, Norway

Page 2: Qomex2010

2

« I can't Connect

the Internet! »

SPEECH UNDERSTANDING

SYSTEM OUTPUT

Spoken Dialogue Systems ?Evaluation

? ?Information system

DialogueManager

Spoken Language

Understanding

Automatic Speech

Recognition

Spoken Language

Generation

Text-toSpeech

Complex task - Dynamic interactions: no comparison to an ideal (fidelity) - Diversity of evaluators profiles, individualities and evaluation situations

Page 3: Qomex2010

3

Internal review of evaluation methods:Ad hoc protocolsdepending on the evaluator profile…

Laurent, M., Bretier, P. and Manquillet, C. (2010). Ad-hoc evaluations along the lifecycle of industrial spoken dialogue systems: heading to harmonisation?. In LREC 2010. Malta.

Page 4: Qomex2010

4

Internal review of evaluation methods:Ad hoc protocols... and on the evaluation context!

Laurent, M., Bretier, P. and Manquillet, C. (2010). Ad-hoc evaluations along the lifecycle of industrial spoken dialogue systems: heading to harmonisation?. In LREC 2010. Malta.

http://www.slideshare.net/MarianneLo/lrecmlaurentposter

Page 5: Qomex2010

5

Toward one-size-fits-all evaluation protocols?

Research has exerted considerably effort and attention to devising evaluation metrics that allows for comparison of disparate systems with various tasks and domain. (Paek, 2007)

« 

A critical obstacle to progress in this area is the lack of a general framework for evaluating and comparing the performance of different dialogue agents. (Walker et al., 1997)

« 

We see a multitude of highly interesting - but virtually incomparable – evaluation exercises, which address different aspects of quality, an which rely on different aspects evaluation criteria. (Möller, 2009)

« 

Page 6: Qomex2010

6

Roadmap

1 Evaluation dependent on both context and evaluator

2 The evaluator as a mediator, an anthropocentric framework

3 Software implementation and anticipated added value

Page 7: Qomex2010

7

Free examination

Give the age of the people

Remember the clothes worn by the people

Estimate material circumstances of the family

Surmise what the family had been doing before the arrival

of the unexpected visitor

Remember positions of people and objects in the

room

Yarbus, A. L. (1967), Eye Movement and Vision, Plenum, New York.

1 Evaluation, a rationalising contribution for a decision process

Page 8: Qomex2010

8

Evaluation, a goal driven argumentation discourse

Process through which one defines, obtains and delivers useful pieces of information to settle between the alternative possible

decisions.Daniel STUFFLEBEAM

L'évaluation en éducation et la prise de décision, 1980, Ottawa, Edition NHP.

« 

1

Page 9: Qomex2010

9

Compare

Confront the resultswith initial objectives

Top-down trendSituation interpreted into evaluation needs and procedure.

Nature of the decision to take

Identify the objectives

Define criteria

Deduce the indicators

List the data to capture

Experimental set-up

Bottom-up trend Value judgment: the evaluator creates a

meaning.

Capture the data

Process data into indicators

Note on agrid of criteria

Meet the objectives?

Take the final decision

V-Model process to define of evaluation2

Page 10: Qomex2010

10

Data Processing TechniquesAnalysis

Log Files Question-naires

3rd Party annotation

Physio-metrics

Capture

Interaction performance

Interaction quality

Efficiency related aspects

Utility & Usefulness

Critical viewpoints Etc.

2 A meta-model to define evaluations

Dat

a-D

riven

Goa

l-Driv

en

Page 11: Qomex2010

11

2 A mediator within an “evaluation ecosystem”

Situation

Corpus of evaluatio

ns

Community of

practiceNormative system

Rationalising

system

Demandsystem

System of constraints

Resources

Page 12: Qomex2010

12

User questionnaires

Third-party annotations

Log files

Data as collected in

evaluation campaigns

Datamart

Parameters, a descriptive view on the

system

Dashboards, Ad hoc selection of KPIs with potential

graphics

Personalised dashboards

Define KPIs

Multi Point Of vieW Evaluation Refine Studio3 Software implementation: MPOWERS

Retrieval of KPIs & reports

KPIs, an analytical

statistical view on the system

ITU-T Rec P.Supp.24: Parameters

describing the interaction with SDS

Page 13: Qomex2010

13

Added Value: Impact both for the individual and the belonging communities

Contribution& Involvement

Feedback& Inspiration

Evaluation definition & refinement

Retrieval of evaluation

results

COOPERATE: Contribute, as a knowledge-farming cooperative

CONNECT: Identify and create contact with relevant people.

COLLABORATE: - Feedback to refine

evaluations- Discuss/negotiate to

converge toward common practices

3

Communities of practice

Communities of interest

Page 14: Qomex2010

merci?? ?

@warius


Recommended