+ All Categories
Home > Documents > Microsoft Research India’s Participation in FIRE2008 Raghavendra Udupa [email protected].

Microsoft Research India’s Participation in FIRE2008 Raghavendra Udupa [email protected].

Date post: 31-Dec-2015
Category:
Upload: cory-jacobs
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
29
Microsoft Research India’s Participation in FIRE2008 Raghavendra Udupa [email protected]
Transcript

Microsoft Research India’s Participation in FIRE2008

Raghavendra [email protected]

Inverted Index

Inverted Index

DictionaryDictionary

LA Times 2002

articles

LA Times 2002

articles

Document Ranker

Document Ranker

Query Translator

Query Translator

पि�म फो�रत् यू�न की� र�जन�पित्

CLEF’07 Query #10.2452/447-AHऐसे� दसे त्�वे�ज खो�जिजए जिजनम� पि�म फो�रत् यू�न की� र�जन�पित्की पिवेचा�र� �र चाचा�� की� गई हो�।

Pim Fortuyn politics

CLIR System

Inverted Index

Inverted Index

DictionaryDictionary

Document CollectionDocument Collection

Document Ranker

Document Ranker

Query Translator

Query Translator

Domain Adaptation

Mining Translation Lexicon from Comparable

Corpora

Mining transliterations of

OOV words

Cross-Language

Ranking Model

Mining NETE Transliterations

from Comparable Corpora

Inverted Index

Inverted Index

DictionaryDictionary

Document CollectionDocument Collection

Document Ranker

Document Ranker

Query Translator

Query Translator

Domain Adaptation

Mining transliterations of OOV terms

(ECIR 2009)

Cross-Language Ranking Models

Mining NETE Transliterations

from Comparable Corpora (CIKM’08)

Mining Translation Lexicon from Comparable Corpora (MT

Summit 2007)

Baseline Retrieval System

Language Model-Based Retrieval

).|(log)|()|(

)|(log)|()|(

TTTSSw w

S

TTSw

Tts

dwPwwPqwP

dwPqwPdqScore

T S

T

Probabilistic Translation Lexicon ~100K parallel sentences

IBM Model 3 AlignmentGIZA++

J. Jagarlamudi and A. Kumaran, Cross-LingualInformation Retrieval System for Indian Languages. Working

Notes for the CLEF 2007 Workshop.

FIRE Fighting

Mining Transliterations of Out-Of-Vocabulary Query Terms.

Date-Based Document Restriction.

Mining Transliterations of Out-Of-Vocabulary Query

Terms

Raghavendra Udupa

OOV Query Terms

Many OOV query terms are NEs NEs are often the focus of a query NEs form an open class of terms in all languages. Getting their transliterations right is extremely

important Many OOV query terms are not NEs but

transliterations of English words. E.g. से�मिमन�र (seminar), की��$र�शन (corporation), चा�म्पि'यून

(champion), पिफोल्म (film)

A Hypothesis

The transliterations of most of the transliteratable OOV terms of a query can be found in documents relevant to the query.

Empirical Validation

Collection Transliteratable OOV terms

Terms with transliterations in at least one relevant

document

Terms with transliteration in at

least 50% of relevant documents

CLEF 2006 (Hindi) 62 58 (94%) 49 (79%)

CLEF 2007 (Hindi) 47 42 (89%) 34 (72%)

CLEF 2007 (Tamil) 43 42 (98%) 39 (89%)

A Practical Hypothesis

The transliterations of many of the transliteratable OOV terms of a query can be found in the top results of the CLIR system for the query.

Mining OOV Transliteration Equivalents Basic Idea:

Pair the query with each of the top N results. Treat each pair as a comparable document pair. Mine transliteration equivalents from the comparable

document pairs.

“They are out there, if you know where to look”: MiningTransliterations of OOV Query Terms for Cross-Language Information

RetrievalECIR 2009, Toulouse

Long Queries: MAP

Collection Baseline Transliterations Mining % change over baseline

CLEF 2006 (Hindi) 0.1463 0.2476 +69.24*

CLEF 2007 (Hindi) 0.2521 0.3389 +34.43*

CLEF 2007 (Tamil) 0.1848 0.2270 +22.84*

Short Queries: MAP

Collection Baseline Transliterations Mining

% change over baseline

CLEF 2006 (Hindi) 0.0877 0.1467 67.3

CLEF 2007 (Hindi) 0.1829 0.2323 27.0

CLEF 2007 (Tamil) 0.1024 0.1265 23.5

FIRE 2008: MAP

Baseline Transliterations Mining

% change over baseline

Short (unofficial) 0.2616 0.3191 22

Long (unofficial) 0.4351 0.4871 12

Long (official) 0.4140 0.4526 9

FIRE2008: MAP Difference (Long, official)

HE0121 - HE0120 MAP

-0.10

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

26

29

32

35

38

41

44

47

50

53

56

59

62

65

68

71

74

Query Number

MA

P D

iffe

ren

ce

HE0121 - HE0120

FIRE 2008: Num_Rel_Ret

Baseline Transliterations Mining

Short (unofficial) 70.60 80.0

Long (unofficial) 84.55% 88.54%

Long (official) 79.68% 82.11%

FIRE 2008: P@10

Baseline Transliterations Mining

Short (unofficial) 0.1000 0.4320

Long (unofficial) 0.6260 0.6540

Long (official) 0.6120 0.6480

Mining Transliterations @ FIRE2008 Worked.

Date-Based Document Restriction

Raghavendra Udupa

Dates

Some queries contain dates CLEF 2007, Topic 407: Who was the Australian Prime

Minister in 2002? CLEF 2007, Topic 411: …terrorist car bomb in Bali,

Indonesia, in 2002. CLEF 2006, Topic 326: …winners in any category of the

1995 Emmy Awards. CLEF 2006, Topic 327: …earthquakes in Mexico City in

1995.

Hypothesis

If a query contains a date then the relevant documents for the query are likely to be from the same time period.

Empirical Validation

CLEF’07 LATimes 2002

CLEF’06 GH 95, LATimes 1994

CLEF’06: C327

Title: Earthquakes in Mexico City

Description: Find documents that provide details on the impact of or the

damage caused by earthquakes in Mexico City in 1995. Narrative:

Relevant document should contain some information on earthquakes in Mexico City in 1995, such as their magnitude, damages caused, panic of the inhabitants, etc. Documents on earthquakes in other places in Mexico are not relevant unless the seismic impact was also felt in Mexico City.

Relevant Document

<DOCNO> LA121194-0313 </DOCNO> <DOCID> 107228 </DOCID> December 11, 1994, Sunday, Home Edition A magnitude 6.3 earthquake rocked Mexico City,

causing people to flee their homes in fear. There were no immediate reports of injuries or severe damage. The U.S. Geological Survey's National Earthquake Information Center in Golden, Colo., said the quake's epicenter was in Petatlan in the southwestern state of Guerrero.

Date-Based Document Restriction Identify dates (if any) in the query. Restrict candidate documents to the set of

documents coming from the same time period.

FIRE 2008: Relevant Docs

Topic Relevant Docs from different time period

44 (11/56)

47 (23/32)

48 (70/76)

50 (18/61)

52 (2/38)

73 (10/53)

FIRE 2008: HindiEnglish MAP

Without DR With DR

Short 0.2616 (unofficial)

0.2601 (unofficial)

Long 0.4351(unofficial)

0.4140 (official)

Date-Based Document Restriction @ FIRE2008 Hurt us. Deeper investigation needed.


Recommended