+ All Categories
Home > Documents > C ross- L anguage E valuation F orum ( CLEF )

C ross- L anguage E valuation F orum ( CLEF )

Date post: 05-Feb-2016
Category:
Upload: kathie
View: 38 times
Download: 0 times
Share this document with a friend
Description:
7 th Workshop of the Cross-Language Evaluation Forum (CLEF) Alicante 23 Sept. 2006. Thomas Mandl Information Science Universität Hildesheim [email protected]. CLEF 2006 - Robust Overview. C ross- L anguage E valuation F orum ( CLEF ). - PowerPoint PPT Presentation
30
1 Thomas Mandl: Robust CLEF 2006 - Overview Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim [email protected] 7 th Workshop of the Cross-Language Evaluation Forum (CLEF) Alicante 23 Sept. 2006 CLEF 2006 - Robust Overview
Transcript
Page 1: C ross- L anguage  E valuation  F orum ( CLEF )

1Thomas Mandl: Robust CLEF 2006 - Overview

Cross-Language

Evaluation Forum (CLEF)

Thomas Mandl

Information Science Universität Hildesheim [email protected]

7th Workshop of theCross-Language Evaluation Forum (CLEF)Alicante 23 Sept. 2006

CLEF 2006 -

Robust Overview

Page 2: C ross- L anguage  E valuation  F orum ( CLEF )

2Thomas Mandl: Robust CLEF 2006 - Overview

The user never sees the perspective of an

evaluation (=MAP)

but only the performance on his/her request(s).

Why Robust?Why Robust?Why Robust?Why Robust?

Page 3: C ross- L anguage  E valuation  F orum ( CLEF )

3Thomas Mandl: Robust CLEF 2006 - Overview

Why Robust?Why Robust?Why Robust?Why Robust?

«The unhappy customer, on average, will tell 27 other

people about their experience. With the use of the internet,

whether web pages or e-mail, that number can increase to

the thousands …»

«Dissatisfied customers tell an average of ten other

people about their bad experience. Twelve percent tell up

to twenty people.»

→ Bad news travels fast.

Credit to Jacques Savoy for slide

Page 4: C ross- L anguage  E valuation  F orum ( CLEF )

4Thomas Mandl: Robust CLEF 2006 - Overview

On the other hand, satisfied customers will tell an

average of five people about their positive experience.

→ Good news travels somewhat slower

Why Robust?Why Robust?Why Robust?Why Robust?

Credit to Jacques Savoy for slide

Your system should produce less bad news!

improve on hard topics

best topics

Don’t worry too much about the good news

Page 5: C ross- L anguage  E valuation  F orum ( CLEF )

5Thomas Mandl: Robust CLEF 2006 - Overview

Which system is better?Which system is better?Which system is better?Which system is better?

Topic System Result Topic System Result

1 A 0.1 1 B 0.2

2 A 0.1 2 B 0.2

3 A 0.9 3 B 0.6

GeoAve A 0.21 GeoAve B 0.29

MAP A 0.37 MAP B 0.33

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Result A Result B

I

II

III

n

n

iixgeoAve

1

Topics

Page 6: C ross- L anguage  E valuation  F orum ( CLEF )

6Thomas Mandl: Robust CLEF 2006 - Overview

Purpose: Robustness in CLIRPurpose: Robustness in CLIRPurpose: Robustness in CLIRPurpose: Robustness in CLIR

• Robustness in multilingual retrieval – Stable performance over all topics

instead of high average performance (like at TREC, just for other languages)

– Stable performance over all topics for multi-lingual retrieval

– Stable performance over different languages (so far at CLEF ?)

Page 7: C ross- L anguage  E valuation  F orum ( CLEF )

7Thomas Mandl: Robust CLEF 2006 - Overview

Ultimate GoalUltimate GoalUltimate GoalUltimate Goal

• „More work needs to be done on customizing methods for each topic“ (Harman 2005)

• Hard, but some work ist done– RIA Worshop– SIGIR Workshop on Topic Difficulty– TREC Robust Track (until 2005)– ...

Page 8: C ross- L anguage  E valuation  F orum ( CLEF )

8Thomas Mandl: Robust CLEF 2006 - Overview

Data CollectionData CollectionData CollectionData Collection

Language Collection

English LA Times 94, Glasgow Herald 95

French ATS (SDA) 94/95, Le Monde 94

Italian La Stampa 94, AGZ (SDA) 94/95

Dutch NRC Handelsblad 94/95, Algemeen Dagblad 94/95

German Frankfurter Rundschau 94/95, Spiegel 94/95, SDA 94

Spanish EFE 94/95

• Six languages • 1.35 million documents• 3.6 gigabytes of text

Page 9: C ross- L anguage  E valuation  F orum ( CLEF )

9Thomas Mandl: Robust CLEF 2006 - Overview

Data CollectionsData CollectionsData CollectionsData Collections

#41-90

2001 2002 2003

Topics

Relevance Judgements

Documents

#91-140 #141-200

Missing relevance

judgements

CLEF Year

Page 10: C ross- L anguage  E valuation  F orum ( CLEF )

10Thomas Mandl: Robust CLEF 2006 - Overview

Test and Training SetTest and Training SetTest and Training SetTest and Training Set

• Arbitrary split of the 160 topics

• 60 training topics

• 100 test topics

Page 11: C ross- L anguage  E valuation  F orum ( CLEF )

11Thomas Mandl: Robust CLEF 2006 - Overview

Identification of Difficult TopicsIdentification of Difficult TopicsIdentification of Difficult TopicsIdentification of Difficult Topics

• Finding a set of difficult topics ... was difficult– Before the campaign, no topics were found

which were difficult for more than one language or task at CLEF 2001, CLEF 2002 and CLEF 2003

– Several definitions of difficulty were tested

• Just as at Robust Track @ TREC– Topics are not difficult by themselves– but only in interaction with a collection

Page 12: C ross- L anguage  E valuation  F orum ( CLEF )

12Thomas Mandl: Robust CLEF 2006 - Overview

Task Design for RobustnessTask Design for RobustnessTask Design for RobustnessTask Design for Robustness

• Sub-Tasks– Mono-lingual

• Only for the six document languages as topic language

– Bi-lingual• it->es• fr->nl• en->de

– Multi-lingual

Page 13: C ross- L anguage  E valuation  F orum ( CLEF )

13Thomas Mandl: Robust CLEF 2006 - Overview

Sub-TasksSub-TasksSub-TasksSub-Tasks

• Submission of Training data was encouraged– Further analysis of topic difficulty

• Overall, systems did better on test topics

Page 14: C ross- L anguage  E valuation  F orum ( CLEF )

14Thomas Mandl: Robust CLEF 2006 - Overview

ParticipantsParticipantsParticipantsParticipants

• U. Coruna & U. Sunderland (Spain & UK)

• U. Jaen (Spain)• DAEDALUS & Madrid

Univs. (Spain)• U. Salamanca – REINA

(Spain)

• Hummingbird Core Tech. (Canada)

• U. Neuchatel (Switzerland)

• Dublin City U. – Computing (Ireland)

• U. Hildesheim – Inf. Sci. (Germany)

Page 15: C ross- L anguage  E valuation  F orum ( CLEF )

15Thomas Mandl: Robust CLEF 2006 - Overview

More than 100 Runs ...More than 100 Runs ...More than 100 Runs ...More than 100 Runs ...

Task Language nr test runs nr training runs nr groups

mono en 13 7 6

fr 18 10 7

nl 7 3 3

de 7 3 3

es 11 5 5

it 11 5 5

bi it->es 8 2 3

fr->nl 4 0 1

en->de 5 1 2

multi multi 10 3 4

Page 16: C ross- L anguage  E valuation  F orum ( CLEF )

16Thomas Mandl: Robust CLEF 2006 - Overview

Results – Mono-lingualResults – Mono-lingualResults – Mono-lingualResults – Mono-lingual

Track

1st 2nd 3rd 4th 5th Diff.

Dutch hummingbird daedalus colesir 1st vs 3rd

MAP 51.06% 42.39% 41.60% 22.74%

GMAP 25.76% 17.57% 16.40% 57.13%

Run humNL06RtdenlFSnlR2S

English hummingbird reina dcu daedalus colesir 1st vs 5thMAP 47.63% 43.66% 43.48% 39.69% 37.64% 26.54%GMAP 11.69% 10.53% 10.11% 8.93% 8.41% 39.00%Run humEN06RtdereinaENtdtest dcudesceng12075enFSenR2S

French unine hummingbird reina dcu colesir 1st vs 5thMAP 47.57% 45.43% 44.58% 41.08% 39.51% 20.40%GMAP 15.02% 14.90% 14.32% 12.00% 11.91% 26.11%

Run UniNEfrr1 humFR06RtdereinaFRtdtest dcudescfr12075CoLesIRfrTst

CoLesIRnlTst

CoLesIRenTst

Participant Rank

Page 17: C ross- L anguage  E valuation  F orum ( CLEF )

17Thomas Mandl: Robust CLEF 2006 - Overview

Results – Mono-lingualResults – Mono-lingualResults – Mono-lingualResults – Mono-lingual

Track

1st 2nd 3rd 4th 5th Diff.

German hummingbird colesir daedalus 1st vs 3rd

MAP 48.30% 37.21% 34.06% 41.81%

GMAP 22.53% 14.80% 10.61% 112.35%

Run humDE06RtdeCoLesIRdeTstdeFSdeR2S

Italian hummingbird reina dcu daedalus colesir 1st vs 5thMAP 41.94% 38.45% 37.73% 35.11% 32.23% 30.13%GMAP 11.47% 10.55% 9.19% 10.50% 8.23% 39.37%

Run humIT06Rtde reinaITtdtest dcudescit1005itFSitR2S CoLesIRitTst

Spanish hummingbird reina dcu daedalus colesir 1st vs 5thMAP 45.66% 44.01% 42.14% 40.40% 40.17% 13.67%GMAP 23.61% 22.65% 21.32% 19.64% 18.84% 25.32%Run humES06RtdereinaEStdtest dcudescsp12075esFSesR2S CoLesIResTst

Participant Rank

Page 18: C ross- L anguage  E valuation  F orum ( CLEF )

18Thomas Mandl: Robust CLEF 2006 - Overview

Results – Mono-lingual EnglishResults – Mono-lingual EnglishResults – Mono-lingual EnglishResults – Mono-lingual English

Page 19: C ross- L anguage  E valuation  F orum ( CLEF )

19Thomas Mandl: Robust CLEF 2006 - Overview

Results – Mono-lingual ItalianResults – Mono-lingual ItalianResults – Mono-lingual ItalianResults – Mono-lingual Italian

Page 20: C ross- L anguage  E valuation  F orum ( CLEF )

20Thomas Mandl: Robust CLEF 2006 - Overview

Results – Mono-lingual SpanishResults – Mono-lingual SpanishResults – Mono-lingual SpanishResults – Mono-lingual Spanish

Page 21: C ross- L anguage  E valuation  F orum ( CLEF )

21Thomas Mandl: Robust CLEF 2006 - Overview

Results – Mono-lingual FrenchResults – Mono-lingual FrenchResults – Mono-lingual FrenchResults – Mono-lingual French

Page 22: C ross- L anguage  E valuation  F orum ( CLEF )

22Thomas Mandl: Robust CLEF 2006 - Overview

Results – Mono-lingual GermanResults – Mono-lingual GermanResults – Mono-lingual GermanResults – Mono-lingual German

Page 23: C ross- L anguage  E valuation  F orum ( CLEF )

23Thomas Mandl: Robust CLEF 2006 - Overview

Results – Bi-lingualResults – Bi-lingualResults – Bi-lingualResults – Bi-lingual

Page 24: C ross- L anguage  E valuation  F orum ( CLEF )

24Thomas Mandl: Robust CLEF 2006 - Overview

Results – Multi-lingualResults – Multi-lingualResults – Multi-lingualResults – Multi-lingual

Track Participant Rank1st 2nd 3rd 4th

Multilingual jaen daedalus colesir reinaMAP 27.85% 22.67% 22.63% 19.96%GMAP 15.69% 11.04% 11.24% 13.25%Run ujamlrsv2 mlRSFSen2S CoLesIRmultTst reinaES2mtdtest

Page 25: C ross- L anguage  E valuation  F orum ( CLEF )

25Thomas Mandl: Robust CLEF 2006 - Overview

Results – Multi-lingualResults – Multi-lingualResults – Multi-lingualResults – Multi-lingual

Page 26: C ross- L anguage  E valuation  F orum ( CLEF )

26Thomas Mandl: Robust CLEF 2006 - Overview

Topic AnalysisTopic AnalysisTopic AnalysisTopic Analysis

• Example: topic 64– Easiest topic for mono and bi German– Hardest topic for mono Italian

• Example: topic 144– Easiest topic for four bi Dutch– hardest for mono and bi German

• Example: topic 146– Among three hardest topics for four sub-tasks– mid-range for all other sub-tasks

Page 27: C ross- L anguage  E valuation  F orum ( CLEF )

27Thomas Mandl: Robust CLEF 2006 - Overview

ApproachesApproachesApproachesApproaches

• SINAI expanded with terms gathered from a web search engine [Martinez-Santiago et al. 2006]

• REINA used a heuristic for determining hard topics during training. Different expansion techniques were applied [Zazo et al. 2006]

• Hummingbird experimented with other evaluation measures then used in the track [Tomlinson 2006].

• MIRACLE tried to find a fusion scheme which is good for the robust measure [Goni-Menoyo et al. 2006]

Page 28: C ross- L anguage  E valuation  F orum ( CLEF )

28Thomas Mandl: Robust CLEF 2006 - Overview

OutlookOutlookOutlookOutlook

• What can we do with the data?

• Have people improved in comparison to CLEF 2001 through 2003?

• Are low MAP scores a good indicator of topic difficulty?

Page 29: C ross- L anguage  E valuation  F orum ( CLEF )

29Thomas Mandl: Robust CLEF 2006 - Overview

AcknowledgementsAcknowledgementsAcknowledgementsAcknowledgements

• Giorgio di Nunzio & Nicola Ferro (U Padua)

• Robust Committee – Donna Harman (NIST)– Carol Peters (ISTI-CNR)– Jacques Savoy (U Neuchâtel)– Gareth Jones (Dublin City U)

• Ellen Voorhees (NIST)

• Participants

Page 30: C ross- L anguage  E valuation  F orum ( CLEF )

30Thomas Mandl: Robust CLEF 2006 - Overview

Thanks for your Attention

I am looking forward to the Discussion

Please come to the Breakout Session


Recommended