Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | steven-melvin-webb |
View: | 215 times |
Download: | 0 times |
Interactive Probabilistic Search for GikiCLEF
Ray R LarsonSchool of Information
University of California, Berkeley
CLEF 2009 -- Corfu, Greece September 21, 2007
GikiCLEF TaskGikiCLEF Task
Perform QA style retrieval for complex questions with geographic elements
Task specifications were not really clear (to me at least) since only Wikipedia article titles that WERE answers were acceptable - and not articles that CONTAINED answers if the article “type” was wrong
Perform QA style retrieval for complex questions with geographic elements
Task specifications were not really clear (to me at least) since only Wikipedia article titles that WERE answers were acceptable - and not articles that CONTAINED answers if the article “type” was wrong
CLEF 2009 -- Corfu, Greece September 21, 2007
GikiCLEF Task GikiCLEF Task
Questions were VERY complex, such as:Which countries have the white, green
and red colors in their national flag?Which authors were born in and write
about the Bohemian Forest?What Belgians won the Ronde van
Vlaanderen exactly twice? List the left side tributaries of the Po river
Questions were VERY complex, such as:Which countries have the white, green
and red colors in their national flag?Which authors were born in and write
about the Bohemian Forest?What Belgians won the Ronde van
Vlaanderen exactly twice? List the left side tributaries of the Po river
CLEF 2009 -- Corfu, Greece September 21, 2007
Approach to GikiCLEFApproach to GikiCLEF
We had no idea about how to handle many of these questions
So, we decided to devote our participation to exploring approaches via an interactive interface to the Cheshire II system
We wanted to see what techniques would be effective (and which not) in suggesting documents with relevant contentAt least until we realized that relevant content
was not a relevant answer
We had no idea about how to handle many of these questions
So, we decided to devote our participation to exploring approaches via an interactive interface to the Cheshire II system
We wanted to see what techniques would be effective (and which not) in suggesting documents with relevant contentAt least until we realized that relevant content
was not a relevant answer
CLEF 2009 -- Corfu, Greece September 21, 2007
Adapting Cheshire II for GikiCLEF
Adapting Cheshire II for GikiCLEF
For this task we created an interactive version of the database which included:Multiple indexes and re-implementation of
links in the wikipedia corpus as title searches to both retrieve identical and similar titles
Cross-searching between the different language corporaSome parts of some queries were better searched
in specific languagesRelied on semi-intelligent translations (me)
and occasionally Babelfish
For this task we created an interactive version of the database which included:Multiple indexes and re-implementation of
links in the wikipedia corpus as title searches to both retrieve identical and similar titles
Cross-searching between the different language corporaSome parts of some queries were better searched
in specific languagesRelied on semi-intelligent translations (me)
and occasionally Babelfish
CLEF 2009 -- Corfu, Greece September 21, 2007
Not all topics are so simple
Not all topics are so simple
Some topics require multiple background searches to help determine possible answersE.g. Searching for the football teams of all
South American countries requires that you know all South American CountriesOften Wikipedia List pages are available for these
kinds of questions -- but are not themselves considered relevant in this task
For example…
Some topics require multiple background searches to help determine possible answersE.g. Searching for the football teams of all
South American countries requires that you know all South American CountriesOften Wikipedia List pages are available for these
kinds of questions -- but are not themselves considered relevant in this task
For example…
CLEF 2009 -- Corfu, Greece September 21, 2007
RELEVANT - But no way to verify withoutthe flag page or images and computer vision analysis
CLEF 2009 -- Corfu, Greece September 21, 2007
Multilingual SearchMultilingual Search
Conducted English search first and identified what I thought were relevant items
Then did the same with each other language using the English results as a guide, but open to new relevant items not in the English collections
Usually relied on translation approximations and cognatesE.g. Bulgarian had enough similarities to Russian
to be able to get a sense of the meaning
Conducted English search first and identified what I thought were relevant items
Then did the same with each other language using the English results as a guide, but open to new relevant items not in the English collections
Usually relied on translation approximations and cognatesE.g. Bulgarian had enough similarities to Russian
to be able to get a sense of the meaning
CLEF 2009 -- Corfu, Greece September 21, 2007
Search MethodsSearch MethodsBasic ranked search was not very effective on
its ownUsing ranked search with all terms required
(boolean constraint) was more effectiveSometimes the best approach was a simple
Boolean exact match on namesBut when going across languages may need ranked
approximate searches tooDefinitely need a type classification index for
pages that can be used to constrain resultsProbably need to use link indexes more often
too (to find pages of the correct type that link to the relevant page)
Basic ranked search was not very effective on its own
Using ranked search with all terms required (boolean constraint) was more effective
Sometimes the best approach was a simple Boolean exact match on namesBut when going across languages may need ranked
approximate searches tooDefinitely need a type classification index for
pages that can be used to constrain resultsProbably need to use link indexes more often
too (to find pages of the correct type that link to the relevant page)
CLEF 2009 -- Corfu, Greece September 21, 2007
LimitationsLimitations
Interactive search was a very slow processOften took several hours per question
The short test period fell during a family vacationIt is more fun to go out to dinner at a
nice restaurant than to try to translate Bulgarian
Interactive search was a very slow processOften took several hours per question
The short test period fell during a family vacationIt is more fun to go out to dinner at a
nice restaurant than to try to translate Bulgarian
CLEF 2009 -- Corfu, Greece September 21, 2007
LimitationsLimitations
As a result of the preceding constraints I was only able to complete 22 of 50 topicsBut each included all languages
whenever possibleIn spite of this (and since scoring
penalizes wrong answers as well as rewarding correct ones…) manage to score pretty well
As a result of the preceding constraints I was only able to complete 22 of 50 topicsBut each included all languages
whenever possibleIn spite of this (and since scoring
penalizes wrong answers as well as rewarding correct ones…) manage to score pretty well
CLEF 2009 -- Corfu, Greece September 21, 2007
ConclusionsConclusionsThe interactive approach showed
some search strategies than might be exploited automatically
Automatic approaches that rely only on conventional IR techniques will probably continue to lag the “knowledge-based” approaches used for this task
The trick for the future will be trying to effective combine the two
The interactive approach showed some search strategies than might be exploited automatically
Automatic approaches that rely only on conventional IR techniques will probably continue to lag the “knowledge-based” approaches used for this task
The trick for the future will be trying to effective combine the two