Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | cory-jacobs |
View: | 215 times |
Download: | 0 times |
Inverted Index
Inverted Index
DictionaryDictionary
LA Times 2002
articles
LA Times 2002
articles
Document Ranker
Document Ranker
Query Translator
Query Translator
पि�म फो�रत् यू�न की� र�जन�पित्
CLEF’07 Query #10.2452/447-AHऐसे� दसे त्�वे�ज खो�जिजए जिजनम� पि�म फो�रत् यू�न की� र�जन�पित्की पिवेचा�र� �र चाचा�� की� गई हो�।
Pim Fortuyn politics
CLIR System
Inverted Index
Inverted Index
DictionaryDictionary
Document CollectionDocument Collection
Document Ranker
Document Ranker
Query Translator
Query Translator
Domain Adaptation
Mining Translation Lexicon from Comparable
Corpora
Mining transliterations of
OOV words
Cross-Language
Ranking Model
Mining NETE Transliterations
from Comparable Corpora
Inverted Index
Inverted Index
DictionaryDictionary
Document CollectionDocument Collection
Document Ranker
Document Ranker
Query Translator
Query Translator
Domain Adaptation
Mining transliterations of OOV terms
(ECIR 2009)
Cross-Language Ranking Models
Mining NETE Transliterations
from Comparable Corpora (CIKM’08)
Mining Translation Lexicon from Comparable Corpora (MT
Summit 2007)
Baseline Retrieval System
Language Model-Based Retrieval
).|(log)|()|(
)|(log)|()|(
TTTSSw w
S
TTSw
Tts
dwPwwPqwP
dwPqwPdqScore
T S
T
Probabilistic Translation Lexicon ~100K parallel sentences
IBM Model 3 AlignmentGIZA++
J. Jagarlamudi and A. Kumaran, Cross-LingualInformation Retrieval System for Indian Languages. Working
Notes for the CLEF 2007 Workshop.
FIRE Fighting
Mining Transliterations of Out-Of-Vocabulary Query Terms.
Date-Based Document Restriction.
OOV Query Terms
Many OOV query terms are NEs NEs are often the focus of a query NEs form an open class of terms in all languages. Getting their transliterations right is extremely
important Many OOV query terms are not NEs but
transliterations of English words. E.g. से�मिमन�र (seminar), की��$र�शन (corporation), चा�म्पि'यून
(champion), पिफोल्म (film)
A Hypothesis
The transliterations of most of the transliteratable OOV terms of a query can be found in documents relevant to the query.
Empirical Validation
Collection Transliteratable OOV terms
Terms with transliterations in at least one relevant
document
Terms with transliteration in at
least 50% of relevant documents
CLEF 2006 (Hindi) 62 58 (94%) 49 (79%)
CLEF 2007 (Hindi) 47 42 (89%) 34 (72%)
CLEF 2007 (Tamil) 43 42 (98%) 39 (89%)
A Practical Hypothesis
The transliterations of many of the transliteratable OOV terms of a query can be found in the top results of the CLIR system for the query.
Mining OOV Transliteration Equivalents Basic Idea:
Pair the query with each of the top N results. Treat each pair as a comparable document pair. Mine transliteration equivalents from the comparable
document pairs.
“They are out there, if you know where to look”: MiningTransliterations of OOV Query Terms for Cross-Language Information
RetrievalECIR 2009, Toulouse
Long Queries: MAP
Collection Baseline Transliterations Mining % change over baseline
CLEF 2006 (Hindi) 0.1463 0.2476 +69.24*
CLEF 2007 (Hindi) 0.2521 0.3389 +34.43*
CLEF 2007 (Tamil) 0.1848 0.2270 +22.84*
Short Queries: MAP
Collection Baseline Transliterations Mining
% change over baseline
CLEF 2006 (Hindi) 0.0877 0.1467 67.3
CLEF 2007 (Hindi) 0.1829 0.2323 27.0
CLEF 2007 (Tamil) 0.1024 0.1265 23.5
FIRE 2008: MAP
Baseline Transliterations Mining
% change over baseline
Short (unofficial) 0.2616 0.3191 22
Long (unofficial) 0.4351 0.4871 12
Long (official) 0.4140 0.4526 9
FIRE2008: MAP Difference (Long, official)
HE0121 - HE0120 MAP
-0.10
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
26
29
32
35
38
41
44
47
50
53
56
59
62
65
68
71
74
Query Number
MA
P D
iffe
ren
ce
HE0121 - HE0120
FIRE 2008: Num_Rel_Ret
Baseline Transliterations Mining
Short (unofficial) 70.60 80.0
Long (unofficial) 84.55% 88.54%
Long (official) 79.68% 82.11%
FIRE 2008: P@10
Baseline Transliterations Mining
Short (unofficial) 0.1000 0.4320
Long (unofficial) 0.6260 0.6540
Long (official) 0.6120 0.6480
Dates
Some queries contain dates CLEF 2007, Topic 407: Who was the Australian Prime
Minister in 2002? CLEF 2007, Topic 411: …terrorist car bomb in Bali,
Indonesia, in 2002. CLEF 2006, Topic 326: …winners in any category of the
1995 Emmy Awards. CLEF 2006, Topic 327: …earthquakes in Mexico City in
1995.
Hypothesis
If a query contains a date then the relevant documents for the query are likely to be from the same time period.
CLEF’06: C327
Title: Earthquakes in Mexico City
Description: Find documents that provide details on the impact of or the
damage caused by earthquakes in Mexico City in 1995. Narrative:
Relevant document should contain some information on earthquakes in Mexico City in 1995, such as their magnitude, damages caused, panic of the inhabitants, etc. Documents on earthquakes in other places in Mexico are not relevant unless the seismic impact was also felt in Mexico City.
Relevant Document
<DOCNO> LA121194-0313 </DOCNO> <DOCID> 107228 </DOCID> December 11, 1994, Sunday, Home Edition A magnitude 6.3 earthquake rocked Mexico City,
causing people to flee their homes in fear. There were no immediate reports of injuries or severe damage. The U.S. Geological Survey's National Earthquake Information Center in Golden, Colo., said the quake's epicenter was in Petatlan in the southwestern state of Guerrero.
Date-Based Document Restriction Identify dates (if any) in the query. Restrict candidate documents to the set of
documents coming from the same time period.
FIRE 2008: Relevant Docs
Topic Relevant Docs from different time period
44 (11/56)
47 (23/32)
48 (70/76)
50 (18/61)
52 (2/38)
73 (10/53)
FIRE 2008: HindiEnglish MAP
Without DR With DR
Short 0.2616 (unofficial)
0.2601 (unofficial)
Long 0.4351(unofficial)
0.4140 (official)