Survey Jaehui Park 2008. 07. 17.. Copyright 2008 by CEBT Introduction Members Jung-Yeon Yang,...

SurveySurvey

Jaehui Park2008. 07. 17.

Copyright 2008 by CEBT

IntroductionIntroduction Members

Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon We are interested in

Issues in Information Retrieval– About crawling, indexing, searching and ranking methods

How to process multi-term queries in information retrieval environments– Ex)

Today US Today Today Weather Paris Today Weather-> Multi-term queries express more complex information need than single

queries.

2


Main TopicMain Topic Long Queries in Keyword Search Keywords:

– Compound query, Evidence Combination, Phrasal Query, Multi-term Query, Multiple Keyword Search, Multiword Unit, and so on.

Issues proximity or distance syntactic structure (order) semantic NLP remedies …

3


ProximityProximity An intuitive concept for processing multiple term queries Readings

Term Proximity Scoring for Keyword-Based Retrieval Systems – [ECIR 2003] Yves Rasolofo and Jacques Savoy

Efficiency vs. Effectiveness in Terabyte-Scale Information Retrieval– [TREC 2005] Stefan Buttcher and Charles L. A. Clarke

Efficient Text Proximity Search– [SPIRE 2007] Ralf Schenkel, et al.

Why Bigger Windows Are Better Than Smaller Ones– [TR-UM 1997] Ron Papka and James Allan

…

4

Term Proximity Scoring for Keyword-Based Term Proximity Scoring for Keyword-Based Retrieval Systems Retrieval Systems

Yves Rasolofo and Jacques SavoyEuropean Colloquium on IR Research(ECIR) 2003, LNCS 2633

2008. 07. 17.Presented by Jaehui Park


IntroductionIntroduction Phrase, term proximity or term distance in IR

Focus on adding a word pair scoring module Okapi probabilistic model + proximity measurement

Previous work Salton & McGil [1983]

– Generating statistical phrases based on word co-occurrence Fagan [1987]

– Considering syntactic relation or syntactic structures Mitra et al. [1997]

– “Once a good basic ranking scheme is used, the use of phrases do not have a major effect on precision at high ranks”

Arampatzis et al.[2000]– The lack of success when using NLP technique in IR

Hawking & Thistlewaite [1996]– The use of proximity scoring within the PADRE system (Z-mode method)

6


OkapiOkapi Okapi [Robertson & Spark Jones 1976]

Document ranking function according to their relevance to a given search query based on the probabilistic retrieval model

Considering– Term frequency– Document length

The weight for a given term ti in document d

7


OkapiOkapi Okapi [Robertson & Spark Jones 1976] (continued)

The weight for the term ti within a query

The retrieval status value (for a document according to a query)

8


Term Proximity WeightingTerm Proximity Weighting Improving retrieval performance by using term proximity

scoring Assumption

If a document contains sentences having at least two query terms within them, the probability that this document will be relevant must be greater.

The closer are the query terms, the higher is the relevance probability.

Objective Assigning more importance to those keywords having a

short distance between their occurrences.

9


Term Proximity WeightingTerm Proximity Weighting 1. expand the request(query) using keyword pairs

extracted from the query’s wording

2. compute a term pair instance weight

“information retrieval “ : 1.0 “the retrieval of medical information” : 0.11 (1/9)

10


Term Proximity WeightingTerm Proximity Weighting 3. sum all the corresponding term pairs

4. compute the contribution of all occurring term pairs in the document

5. compute the final retrieval status value

11


ExperimentsExperiments Test Collections

TREC-8 document (528,155 docs)– Financial Times, Federal Register, Foreign Broadcast

Information Service, LA Times TREC-9, TREC-10 (1,692,096 docs)

12


ExperimentsExperiments Evaluation

13



14



15


ConclusionConclusion The impact of a new term proximity algorithm on

retrieval effectiveness for keyword-based system was examined. Improve ranking for documents having query term pairs

occurring within a given distance constraint.

The term proximity scoring approach Improve precision after retrieving a few documents

16

Date post:	17-Jan-2018
Category:	Documents
Upload:	caitlin-anderson
View:	219 times
Download:	0 times

Survey Jaehui Park 2008. 07. 17.. Copyright 2008 by CEBT Introduction Members Jung-Yeon Yang,...

Documents