The Importance of Evaluation for Multilingual Information Retrieval
Carol PetersISTI-CNR, Pisa, Italy
FIRE 2011IIT Bombay, 2-4 December 2011
From FIRE 2008 to FIRE 2010
FIRE 2008 CLEF: Objectives and First Results
FIRE 2010 10 Years of CLEF: An Assessment
What we’ve done
What we’ve learned
What the next steps should be
FIRE 2011 Exploiting the Results for MLIR System Building
FIRE 2011IIT Bombay, 2-4 December 2011
In IR the role of an evaluation campaign is to: Identify priority areas for research:
evaluation permits hypotheses to be validated and progress assessed
Support system development and testing evaluation saves developers time and money
1997 – First MLIR/CLIR system evaluation campaigns in US and Japan: TREC and NTCIR
2000 – MLIR/CLIR evaluation in Europe: CLEF (extension of CLIR track at TREC)
2008 – FIRE: MLIR/CLIR evaluation for Indian languages
MLIR/CLIR System Evaluation
FIRE 2011IIT Bombay, 2-4 December 2011
These evaluation initiatives: Promote research
Encourage creation of multi-disciplinary communities
Produce vast amounts of valuable scientific data
Favour understanding of issues involved in successful system development
Results
FIRE 2011IIT Bombay, 2-4 December 2011
The Need for MLIR/CLIR?
What are the Challenges?
What is the Contribution of Evaluation?
The Example of CLEF
Outline
FIRE 2011IIT Bombay, 2-4 December 2011
Web is an important platform for knowledge dissemination and acquisition
User information needs are increasingly variedFrom primarily academic use to widespread
commercial, leisure, educational, entertainment etc. uses
Content is available in many languages and non-English content is growing rapidly
Information providers and seekers should have equal opportunities
Preservation of national languages
MLIR in the Information Society
FIRE 2011IIT Bombay, 2-4 December 2011
The Need for Multilingual Search
http://www.internetworldstats.com/stats.htm
FIRE 2011IIT Bombay, 2-4 December 2011
Country Population Internet Users 2000
Internet Users 2011
Penetration % of Pop.
China 1,336,718,015 22,5000.000 485,000,000 36.3%
United States 313,232,044 95,354,000 245,000,000 78.2%
India 1,189,172,906 5,000,000 100,000,000 8.4%
Japan 126,475,664 47,080,000 99,182,000 78.4%
Brazil 203,429,773 5,000,000 75,982,000 37.4%
Germany 81,471,834 24,000,000 65,125,000 79.9%
Russia 138,739,892 3,100,000 59,700,000 43.0%
UK 62,698,362 15,400,000 51,442,100 82.0%
France 65,102,719 8,500,000 45,262,000 69.5%
Nigeria 155,215,573 20,000 43,982,200 28.3%
Countries with most Internet Users
http://www.internetworldstats.com/top20.htmFIRE 2011
IIT Bombay, 2-4 December 2011
Country Population Internet Users 2000
Internet Users 2011
Penetration % of Pop.
China 1,336,718,015 22,5000.000 485,000,000 36.3%
United States 313,232,044 95,354,000 245,000,000 78.2%
India 1,189,172,906 5,000,000 100,000,000 8.4%
Japan 126,475,664 47,080,000 99,182,000 78.4%
Brazil 203,429,773 5,000,000 75,982,000 37.4%
Germany 81,471,834 24,000,000 65,125,000 79.9%
Russia 138,739,892 3,100,000 59,700,000 43.0%
UK 62,698,362 15,400,000 51,442,100 82.0%
France 65,102,719 8,500,000 45,262,000 69.5%
Nigeria 155,215,573 20,000 43,982,200 28.3%
Countries with most Internet Users
http://www.internetworldstats.com/top20.htmFIRE 2011
IIT Bombay, 2-4 December 2011
Concerns the storage, access, retrieval and presentation of digital information in any of the world's languages.
Main areas of interest: enabling technology (character encoding, scripts,
internationalisation, localisation)multiple language access, browsing, retrieval,
displayCrossing the language boundary (filtering, merging,
ranking, selecting, presenting results)
MLIR related research
FIRE 2011IIT Bombay, 2-4 December 2011
FIRE 2011IIT Bombay, 2-4 December 2011
The Terminology
Multilingual Information Access (MLIA) Accessing, querying and retrieving information
from collections in any language (covering basic enabling techniques and including MLIR and CLIR)
Multilingual Information Retrieval (MLIR) Information retrieval in multiple languages
(includes CLIR)
Cross-Language Information Retrieval (CLIR) Querying multilingual collections in one language in
order to retrieve documents in other languages
Fully multilingual, multimodal IR systems
capable of processing a query in any medium and any language
finding relevant information from a multilingual multimedia collection containing documents in any language and form
and presenting it in style most likely to be useful to the user
The Grand Challenge
Oard & Hull , AAAI Spring Symposium, Stanford 1997
FIRE 2011IIT Bombay, 2-4 December 2011
MLIR/CLIR System Development is Complex
There are 6,800 known languages spoken in 200 countries ca 2,250 have writing systems (the others are only spoken) Just 300 have some kind of language processing tools
MLIR/CLIR System development involves integrating IR techniques with Language Processing tools and Language Transfer mechanisms
Multilingual Portals (Localization)How many languages / how many levels should be
multilingual / how to handle updates /linguistic and cultural dependent issues
Monolingual Search for Multiple Languagesencoding and representation issues / language
identification / indexing issues (stop words, stemmers, morphological analysers, named entity recognition, ..)
Cross-Language Search translation resources (lexicons, corpora, MT systems)
Presentation of Results in form interpretable and exploitable by user
MLIR/CLIR System Development is Complex
FIRE 2011IIT Bombay, 2-4 December 2011
Understanding Search in the Multilingual Context (language & culture)
Globalisation (internationalisation & localisation)
MLIR/CLIR System Development Language processing tools Best retrieval mechanisms (indexing, matching, merging) Best translation resources From text to multimodal retrieval Providing effective user support
Going from Research to Practice
Main Challenges
FIRE 2011IIT Bombay, 2-4 December 2011
Understanding Search in the Multilingual Context (language & culture)
Globalisation (internationalisation & localisation)
MLIR/CLIR System Development Language processing tools Best retrieval mechanisms (indexing, matching, merging) Best translation resources From text to multimodal retrieval Providing effective user support
Going from Research to Practice
Main Challenges
FIRE 2011IIT Bombay, 2-4 December 2011
Understanding Search in the Multilingual Context (language & culture)
Globalisation (internationalisation & localisation)
MLIR/CLIR System Development Language processing tools Best retrieval mechanisms (indexing, matching, merging) Best translation resources From text to multimodal retrieval Providing effective user support
Going from Research to Practice
Main Challenges
FIRE 2011IIT Bombay, 2-4 December 2011
Pre-process & index both documents and queries – generally using language dependent techniques (tokenisation, stopwords, stemming, morphological analysis, decompounding, etc.)
Translate: queries or documents (or both) Translation resources
• Machine Translation (MT)• Parallel/comparable corpora • Bilingual Dictionaries• Multilingual Thesauri• Conceptual Interlingua
Find relevant documents in target collection(s) & present results
Building a CLIR System
FIRE 2011IIT Bombay, 2-4 December 2011
Language identification Morphology: inflection, derivation, compounding, …
OOV terms, e.g. proper names, terminology Multi-word concepts, e.g. phrases and idioms Ambiguity, e.g. polysemy
Handling many languages: L1 -> Ln Merging results from different sources / media Presenting the results in useful fashion
Main CLIR Difficulties (I)
FIRE 2011IIT Bombay, 2-4 December 2011
CLIR system need clever pre-processing of target collections (e.g. semantic analysis, classification, information extraction)
CLIR systems need intelligent post-processing of results: merging/ summarization / translation
CLIR systems need well-developed resourcesLanguage Processing ToolsLanguage Resources
Resources are expensive to acquire, maintain, update
Main CLIR Difficulties (II)
FIRE 2011IIT Bombay, 2-4 December 2011
CLIR for Multimedia
Retrieval from a mixed media collection is non- trivial problem
Different media processed in different ways and suffer from different kinds of indexing errors: spoken documents indexed using speech recognition
handwritten documents indexed using OCR
images indexed using significant features
Need for complex integration of multiple technologies
Need for merging of results from different sources
FIRE 2011IIT Bombay, 2-4 December 2011
Clough October 2011
Supporting the User
FIRE 2011IIT Bombay, 2-4 December 2011
MLIR/CLIR System Evaluation is Complex
Need to evaluate single components Need to evaluate overall system performance Need to distinguish CL aspects from IR issues
FIRE 2011IIT Bombay, 2-4 December 2011
Objectives of CLEFPromote research and stimulate development of multilingual IR systems, through
Creation of evaluation infrastructure and organisation of regular evaluation campaigns for system and component testing
Building of an MLIA/CLIR research community Construction of publicly available test-suites
The VisionStep-by-step promote the development of truly multilingual, multimodal systems
Cross Language Evaluation Forum
FIRE 2011IIT Bombay, 2-4 December 2011
Evolution of CLEF
CLEF 2000
TEXT
mono-, bi- & multilingual text doc retrieval (Ad Hoc) mono- and cross-language information on structured scientific data (Domain-Specific)
CLEF 2001 USER NEEDS
interactive cross-language retrieval (iCLEF)
CLEF 2002 SPEECH
cross-language spoken document retrieval (CL-SR)
CLEF 2003 IMAGE & QA
multiple language question answering (QA@CLEF) cross-language retrieval in image collections (ImageCLEF)
CLEF 2005 WEB & GIR
multilingual retrieval of Web documents (WebCLEF) cross-language geographical retrieval (GeoCLEF)
CLEF 2008 VIDEO
cross-language video retrieval (VideoCLEF) multilingual information filtering (INFILE@CLEF)
CLEF 2009USERS &APPLICATIONS
intellectual property (CLEF-IP) log file analysis (LogCLEF) large-scale grid experiments (Grid@CLEF)
FIRE 2011IIT Bombay, 2-4 December 2011
CLEF Tracks: 2000 - 2009
AdHoc Track: Promotes development of mono and cross-language text retrieval systems
AdHoc 2000-2007 European news documents: increasingly complex & diverse tasks
Monolingual – Bilingual – Multilingual
AdHoc 2008-2009: Non-European news docs; library catalog archives
Advanced Tasks – using previously built test collectionsMultilingual 2 yrs on / mergingRobust – measuring stable performance
Advancing State-of-Art through Evaluation: AdHoc
FIRE 2011IIT Bombay, 2-4 December 2011
Ad Hoc: Importance of Monolingual IR
Need to understand processing requirements of all languages to be queried, eg morphology, syntax, segmentation, special features
Need to adopt best approach per languages
CLEF test collection includes wide variety of European language typesGermanic: Dutch, English, German, SwedishRomance: French, Italian, Portuguese, SpanishSlavic: Russian, Bulgarian, CzechNon-IndoEuropean: Finnish, Hungarian
FIRE 2011IIT Bombay, 2-4 December 2011
AdHoc: Growth in Target Languages
0
2
4
6
8
10
12
14
2000 2001 2002 2003 2004 2005 2006 2007 2008
Czech
Hungarian
Bulgarian
Portuguese
Russian
Swedish
Finnish
Dutch
Spanish
Italian
German
French
English
Donna Harman: CLEF2009 Workshop
Ad Hoc Track 2000-2007:Bilingual & Multilingual Tasks
Tasks made increasingly difficult over the yearsCLEF 2003 - 2 multilingual tasks
- Small-multilingual: 4 “core” language (EN,ES,FR,DE)
- Large-multilingual: 8 languages (+FI,IT,NL,SV)
Bilingual: “unusual” language combinations- IT -> ES FR -> NL- DE -> IT FI -> DE- x -> RU Newcomers only: x -> EN
CLEF 2007: Non-European topic languages- AM/ID/OR/ZH→ EN- BN/HI/MR/TA/TE→ EN
FIRE 2011IIT Bombay, 2-4 December 2011
AdHoc Monolingual Bilingual Multilingual
CLEF2000 DE;FR;IT X→EN X→DE;EN;FR;IT
CLEF2001 DE;ES;FR;IT;NL X→EN, X→NL X→DE;EN;ES;FR;IT
CLEF2002 DE;ES;FI;FRIT;NL;SV
X→DE;ES;FI;FR;IT;NL;SVX→EN(newcomer)
X→DE;EN;ES;FR;IT
CLEF2003 DE;ES;FI;FRIT;NL;RU;SV
IT→ES;DE→ITFR→NL;FI→DEX→RU;X→EN
X→DE;EN;ES;FRX→DE;EN;ES;FI FR;IT;NL;SV
CLEF2004 FI;FR;RU;PT ES/FR/IT/RU→FIDE/FI/NL/SV→FRX→RU;X→EN
X→FI;FR;RU;PT
CLEF2005 BG;FR;HU;PT X→ BG;FR;HU;PTEX →EN
Multi8 2yrsonMulti8 merge
CLEF2006 BG;FR;HU;PT X→ BG;FR;HU;PTX →EN
ROBUST:X→DE;EN;ES; FR;NL
CLEF2007 BG, CZ, HU ROBUST: EN;FR;PT
X→ BG;CZ;HU;AM/ID/OR/ZH→ ENBN/HI/MR/TA/TE→ EN ROBUST: X→EN;FR;PT
CLEF2008&CLEF2009
FATEL: DE; EN; FRROBUST: WSD EN
EN→FATEL: x→DE;EN;FRROBUST: WSD Es →EN
Growth in Complexity
Advancing State-of-Art for Document Retrieval
Quantifiable improvements in performance of CLIR systems TREC-6 1997
EN→FR: 49%; EN→DE: 64% of best monolingual FR & DE retrieval
CLEF-recent2008: EN →IR: 92% of best mono retrieval2008: EN →IR: 92% of best mono retrieval2009 TEL: X→EN 99%; X→DE 90%; X→FR 94% of
best mono retrieval
FIRE 2011IIT Bombay, 2-4 December 2011
Advancing State-of-Art: Summary
Investigation of core issues in MLIR/CLIRdevelopment of multiple language processing toolscreation of linguistic resourcesanalysis of user behaviour implementation of appropriate cross-language retrieval
models and algorithms for different tasks and languages
Development of MLIR/CLIR SystemsFrom Bilingual to MultilingualFrom document retrieval to information extractionFrom free text to structured textFrom Mono-media to Multimedia
FIRE 2011IIT Bombay, 2-4 December 2011
CLEF multilingual comparable corpus of more than 3M news docs in 15 languages: BG,CZ,DE,EN,ES,EU,FI,FR,HU,IT,NL,RU,SV,PT and Persian
The European Library Data in DE, EN, FR (>3M docs) GIRT-4 social science database in EN and DE, Russian ISISS
collection; Cambridge Sociological Abstracts Online Flickr database IAPR TC-12 photo database (20,000 images, captions in EN, DE); ARRS Goldminer database (200,000 medical images) IRMA: 10,000 images for automatic medical image annotation INEX Wikipedia image collection (150,000 images) Very large multilingual collection of Web docs (EuroGov) Malach spontaneous speech collection – EN & CZ (Shoah archives) Dutch / English documentary TV videos Agence France Press (AFP) newswire in Arabic, French & English Patent documents from the European Patent Office
CLEF Achievements at 10 yr milestone: Test Collections
FIRE 2011IIT Bombay, 2-4 December 2011
New Vision & Tools for Comparative System Evaluation
Development of methodology, tools, resources: Language resources: stopword lists, dictionaries, lexicons,
parallel & aligned corpora Linguistic components: stemmers, lemmatizers, PoS
taggers, decompounders Translation approaches: MT, dictionary-based, corpora-
based, conceptual networks IR model testing: boolean, vector space probabilistic,
language models Advanced IR approaches: data fusion, query expansion,
relevance feedback Interface issues: user assistance in query formulation &
results presentation in multilingual context
CLEF Achievementsat 10 year milestone
FIRE 2011IIT Bombay, 2-4 December 2011
Best Practices for MLIR/CLIR
(Exploiting CLEF experience, results, knowhow to provide guidelines for usrs and
developers)
FIRE 2011IIT Bombay, 2-4 December 2011
TrebleCLEF Best Practices and Guidelines
Best Practices White Papers produced, exploiting CLEF experience, results, knowhow to provide guidelines for users and developers: Best Practices for Language Resources for MLIA,
Nicolas Moreau, ELDA
Best Practices for System- and User-oriented MLIA, Martin Braschler, Zurich Univ. Applied Technology & Julio Gonzalo, UNED Madrid
Best Practices for Test Collection Creation and Evaluation Methodologies Mark Sanderson, Univ. Sheffield & Martin Braschler, Zurich Univ. Applied Technology
Designed for practitioners rather than for academicsFIRE 2011
IIT Bombay, 2-4 December 2011
Objectives
Enable system developers identify and find the tools and resources they need when implementing MLIR/CLR system functionality
Foster Collaborative resource creation
Foster creation of common pools of resourcesE.g. for specific less-resourced languages
Foster dissemination of resources after evaluation campaigns and projects are over
Best Practices in MLIR/CLIR Language Resources
FIRE 2011IIT Bombay, 2-4 December 2011
Practical guidelines for evaluating in an efficient and cost-effective way Aims at bridging gap between academic research community and practitoners
Provides information on: Necessary resources, procedures and methods
Lists questions to consider when evaluating: What is the purpose of the evaluation?
What resources are available to conduct the evaluation?
What do you know about the IR system being tested?
Measuring IR system effectiveness
Comparing Results & Using Significance Tests
Best Practices in Test Collection Creation
FIRE 2011IIT Bombay, 2-4 December 2011
Result
Index
Indexing
Query
Indexing
Matching
Document representationQuery representation
WirtschaftWirtschaftResult
Query representation
Translation
DocumentsDocuments
Translation
Document representation
Translation
Result
Recommendations in three parts:
Indexing
Translation
Matching
Best Practices in MLIR/CLIR System-Oriented Aspects
FIRE 2011IIT Bombay, 2-4 December 2011
Best Practices: Indexing for MLIR/CLIR
Use weighted & ranked retrieval copes with translation error
Use Unicode/XML covers different scripts
Use minimal stopword elimination keep maximum information
Remove diacritics, special characters tolerant towards inconsistent spelling
Use stemming covers different word forms
Use decompounding for languages with poductive compound formation
tolerant towards different phrasings
Use character n-grams when stemming resources not available
helps with languages with scarce language resources
FIRE 2011IIT Bombay, 2-4 December 2011
Major Results:Translation & Matching
Maximize coverage of translation resources
reduces retrieval failure due to missing translations
Use document translation to solve merging problems
if combined results in multiple languages are needed
Combine different types of translation resources
minimizes mistranslations inherent to the individual resources
Use an interlingua when direct translation resources unavailable
covers language pairs with no direct translation resources
Use high-performing weighting schemes, eg Okapi/BM25, LM, DFR or lnu.ltn
weighting schemes with robust performance over different types of text
Use pseudo-relevance feedback boosts recall (coverage of results)
FIRE 2011IIT Bombay, 2-4 December 2011
Main Recommendations
Blueprint for Effective System
Effective, well-tuned monolingual retrieval for as many languages as possible
Combination of different sources of translation information
Merging of multiple, well-tuned bilingual results
FIRE 2011IIT Bombay, 2-4 December 2011
Best Practices in MLIA User-oriented Aspects
Recommendations in three parts:
Cross-Language Document Selection
Query Translation & Refinement
Personalization
FIRE 2011IIT Bombay, 2-4 December 2011
Personalization
Allow user to specify language skills and translation preferences in user profile
Help the user choose right translation!
Recommendations Include user-assisted query translation facilities But do not show them by default Indirect user-assisted query translation without target-
language inspection is preferable Link structured sources that help mapping the meaning
of the query, e.g. named entities, Wikipedia, synonyms, KWIC
Query translation, document translation & assited query translation facilities must fit together
Query Translation & Refinement
FIRE 2011IIT Bombay, 2-4 December 2011
Understanding Search in the Multilingual Context (language & culture)
Globalisation (internationalisation & localisation)
MLIR/CLIR System Development Language processing tools Best retrieval mechanisms (indexing, matching, merging) Best translation resources From text to multimodal retrieval Providing effective user support
Going from Research to Practice
Main Challenges
FIRE 2011IIT Bombay, 2-4 December 2011
Search in the Multilingual Context
(includes culture and language)
Peters, Braschler, Clough, 2011
FIRE 2011IIT Bombay, 2-4 December 2011
What is Culture?
Language is most direct expression of culture; it is what makes us human and what gives each of us a sense of identity (EC 2005)Members of the same culture are likely to have the
same knowledge of certain things and would think and act similarly in certain situations
Cultural aspects to consider include Religion, customs, colours, metaphors, icons and
flags, and language Crossing language boundaries implies crossing
cultural boundariesLocalisation of the search user interface…. But so
much more
Clough October 2011
FIRE 2011IIT Bombay, 2-4 December 2011
Understanding Search in the Multilingual Context (language & culture)
Globalisation (internationalisation & localisation)
MLIR/CLIR System Development Language processing tools Best retrieval mechanisms (indexing, matching, merging) Best translation resources From text to multimodal retrieval Providing effective user support
Going from Research to Practice
Main Challenges
FIRE 2011IIT Bombay, 2-4 December 2011
Europeana
From the Lab to the Market Place
If research has been successful and if the problem is (nearly) solved, then
Why are there so few commercial systems ?
Peters, Braschler, Clough, 2011
FIRE 2011IIT Bombay, 2-4 December 2011
Challenges of Integration into the Enterprise
Search system must run on a single ‘off-the-shelf’ server System must be easily integrated into the client’s platform Response times even for complex queries must be fast (<2 s) Scalability problems must be resolved (CLIR queries are
typically several times larger than in monolingual search) Easy tuning of parameters to achieve precision High quality translation of results and presentation according
to customers’ requirements The expected costs for customer support, integration and
maintenance must be low The necessary lexical and translation resources must be easy
to acquire and easy to optimise to meet client’s needs
Peters, Braschler, Clough, 2011
FIRE 2011IIT Bombay, 2-4 December 2011
Summing up
Evaluation provides opportunity to test, tune, and compare approaches in order to improve system performance
Evaluation campaigns promote research
Evaluation campaigns produce huge amounts of valuable experimental data
Evaluation campaigns create communities interested in examining the same issues and comparing ideas and experiences
BUT
FIRE 2011IIT Bombay, 2-4 December 2011
Future Directions
Search systems are used interactively , evaluation must also consider multilingual interface design multilingual search is NOT just an engineering
problem; study impact of users’ cultural background and language skills
Multilingual searching is part of wider information seeking activitiesThe multilingual functionality must be integrated into a
larger application Research has shown we can do cross-language
search in multiple languages well the challenge is going from research to practice
FIRE 2011IIT Bombay, 2-4 December 2011
Summing Up
Never forget the Vision
FIRE 2011IIT Bombay, 2-4 December 2011