+ All Categories
Home > Documents > Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of...

Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of...

Date post: 01-Jan-2016
Category:
Upload: steven-melvin-webb
View: 215 times
Download: 0 times
Share this document with a friend
19
Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley
Transcript

Interactive Probabilistic Search for GikiCLEF

Ray R LarsonSchool of Information

University of California, Berkeley

CLEF 2009 -- Corfu, Greece September 21, 2007

GikiCLEF TaskGikiCLEF Task

Perform QA style retrieval for complex questions with geographic elements

Task specifications were not really clear (to me at least) since only Wikipedia article titles that WERE answers were acceptable - and not articles that CONTAINED answers if the article “type” was wrong

Perform QA style retrieval for complex questions with geographic elements

Task specifications were not really clear (to me at least) since only Wikipedia article titles that WERE answers were acceptable - and not articles that CONTAINED answers if the article “type” was wrong

CLEF 2009 -- Corfu, Greece September 21, 2007

GikiCLEF Task GikiCLEF Task

Questions were VERY complex, such as:Which countries have the white, green

and red colors in their national flag?Which authors were born in and write

about the Bohemian Forest?What Belgians won the Ronde van

Vlaanderen exactly twice? List the left side tributaries of the Po river

Questions were VERY complex, such as:Which countries have the white, green

and red colors in their national flag?Which authors were born in and write

about the Bohemian Forest?What Belgians won the Ronde van

Vlaanderen exactly twice? List the left side tributaries of the Po river

CLEF 2009 -- Corfu, Greece September 21, 2007

Approach to GikiCLEFApproach to GikiCLEF

We had no idea about how to handle many of these questions

So, we decided to devote our participation to exploring approaches via an interactive interface to the Cheshire II system

We wanted to see what techniques would be effective (and which not) in suggesting documents with relevant contentAt least until we realized that relevant content

was not a relevant answer

We had no idea about how to handle many of these questions

So, we decided to devote our participation to exploring approaches via an interactive interface to the Cheshire II system

We wanted to see what techniques would be effective (and which not) in suggesting documents with relevant contentAt least until we realized that relevant content

was not a relevant answer

CLEF 2009 -- Corfu, Greece September 21, 2007

Adapting Cheshire II for GikiCLEF

Adapting Cheshire II for GikiCLEF

For this task we created an interactive version of the database which included:Multiple indexes and re-implementation of

links in the wikipedia corpus as title searches to both retrieve identical and similar titles

Cross-searching between the different language corporaSome parts of some queries were better searched

in specific languagesRelied on semi-intelligent translations (me)

and occasionally Babelfish

For this task we created an interactive version of the database which included:Multiple indexes and re-implementation of

links in the wikipedia corpus as title searches to both retrieve identical and similar titles

Cross-searching between the different language corporaSome parts of some queries were better searched

in specific languagesRelied on semi-intelligent translations (me)

and occasionally Babelfish

CLEF 2009 -- Corfu, Greece September 21, 2007

CLEF 2009 -- Corfu, Greece September 21, 2007

CLEF 2009 -- Corfu, Greece September 21, 2007

CLEF 2009 -- Corfu, Greece September 21, 2007

Not all topics are so simple

Not all topics are so simple

Some topics require multiple background searches to help determine possible answersE.g. Searching for the football teams of all

South American countries requires that you know all South American CountriesOften Wikipedia List pages are available for these

kinds of questions -- but are not themselves considered relevant in this task

For example…

Some topics require multiple background searches to help determine possible answersE.g. Searching for the football teams of all

South American countries requires that you know all South American CountriesOften Wikipedia List pages are available for these

kinds of questions -- but are not themselves considered relevant in this task

For example…

CLEF 2009 -- Corfu, Greece September 21, 2007

CLEF 2009 -- Corfu, Greece September 21, 2007

CLEF 2009 -- Corfu, Greece September 21, 2007

NOT RELEVANT ANSWER?

CLEF 2009 -- Corfu, Greece September 21, 2007

RELEVANT - But no way to verify withoutthe flag page or images and computer vision analysis

CLEF 2009 -- Corfu, Greece September 21, 2007

Multilingual SearchMultilingual Search

Conducted English search first and identified what I thought were relevant items

Then did the same with each other language using the English results as a guide, but open to new relevant items not in the English collections

Usually relied on translation approximations and cognatesE.g. Bulgarian had enough similarities to Russian

to be able to get a sense of the meaning

Conducted English search first and identified what I thought were relevant items

Then did the same with each other language using the English results as a guide, but open to new relevant items not in the English collections

Usually relied on translation approximations and cognatesE.g. Bulgarian had enough similarities to Russian

to be able to get a sense of the meaning

CLEF 2009 -- Corfu, Greece September 21, 2007

Search MethodsSearch MethodsBasic ranked search was not very effective on

its ownUsing ranked search with all terms required

(boolean constraint) was more effectiveSometimes the best approach was a simple

Boolean exact match on namesBut when going across languages may need ranked

approximate searches tooDefinitely need a type classification index for

pages that can be used to constrain resultsProbably need to use link indexes more often

too (to find pages of the correct type that link to the relevant page)

Basic ranked search was not very effective on its own

Using ranked search with all terms required (boolean constraint) was more effective

Sometimes the best approach was a simple Boolean exact match on namesBut when going across languages may need ranked

approximate searches tooDefinitely need a type classification index for

pages that can be used to constrain resultsProbably need to use link indexes more often

too (to find pages of the correct type that link to the relevant page)

CLEF 2009 -- Corfu, Greece September 21, 2007

LimitationsLimitations

Interactive search was a very slow processOften took several hours per question

The short test period fell during a family vacationIt is more fun to go out to dinner at a

nice restaurant than to try to translate Bulgarian

Interactive search was a very slow processOften took several hours per question

The short test period fell during a family vacationIt is more fun to go out to dinner at a

nice restaurant than to try to translate Bulgarian

CLEF 2009 -- Corfu, Greece September 21, 2007

LimitationsLimitations

As a result of the preceding constraints I was only able to complete 22 of 50 topicsBut each included all languages

whenever possibleIn spite of this (and since scoring

penalizes wrong answers as well as rewarding correct ones…) manage to score pretty well

As a result of the preceding constraints I was only able to complete 22 of 50 topicsBut each included all languages

whenever possibleIn spite of this (and since scoring

penalizes wrong answers as well as rewarding correct ones…) manage to score pretty well

CLEF 2009 -- Corfu, Greece September 21, 2007

Results from GikiCLEF SiteResults from GikiCLEF Site

CLEF 2009 -- Corfu, Greece September 21, 2007

ConclusionsConclusionsThe interactive approach showed

some search strategies than might be exploited automatically

Automatic approaches that rely only on conventional IR techniques will probably continue to lag the “knowledge-based” approaches used for this task

The trick for the future will be trying to effective combine the two

The interactive approach showed some search strategies than might be exploited automatically

Automatic approaches that rely only on conventional IR techniques will probably continue to lag the “knowledge-based” approaches used for this task

The trick for the future will be trying to effective combine the two


Recommended