Date post: | 11-May-2015 |
Category: |
Education |
Upload: | tanvi-motwani |
View: | 268 times |
Download: | 1 times |
Constructing Query Models from Elaborate Query Formulations
A Few Examples Go A Long Way
Krisztian [email protected]
Wouter [email protected]
Maarten de [email protected]
ISLA, University of Amsterdam
Presented by Tanvi Motwani
AIM
• This paper aims to introduce and compare several methods for sampling expansion terms with query independent as well as query dependent techniques.
• Along with the query it takes sample documents as input. Sample documents are additional information that users provide consisting of small number of “key references” (pages that should be linked to by good overview page of that topic)
• Aim is to increase “aspect recall” by attempting to uncover aspects of information which are not captured by the query but by the sample documents.
Aspect Retrieval
Query: What are current applications of robotics?
Find as many different applications as possible.
Example Aspects
A1: spot-welding robotics
A2: controlling inventory
A3: pipe-laying robots
A4: talking robot
A5: robots for loading & unloading
memory tapesA6: robot telephone operators
A7: robot cranes… …
Aspect judgments A1 A2 A3 … ... Ak
d1 1 1 0 0 … 0 0
d2 0 1 1 1 … 0 0
d3 0 0 0 0 … 1 0
….dk 1 0 1 0 ... 0 1
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Query Likelihoo
d
Document Modeling
Query Modeli
ng
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Query Likelihoo
d
Document Modeling
Query Modeli
ng
What is a Rainforest?
P(D1|Q) = 0.32
P(D2|Q) = 0.26
P(D3|Q) = 0.19
P(D4|Q) = 0.12
P(D5|Q) = 0.09
Query (Q) Documents
Query Likelihood
•
•
•
•
Bayes’ Rule
Ignoring P(Q)
Assuming Independence of Query terms
Taking log
• Using query and document models
What is a Rainforest?
Query (Q) Documents
Relevance
Model
Underlying Relevance Model
The query and relevant documents are random samples from an underlying relevance model R.
Documents are ranked based on their similarity to the query model.The Kullback-Leibler divergence between the query and document models can he used to provide a ranking of documents.
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Query Likelihoo
d
Document Modeling
Query Modeli
ng
Document Modeling
Maximum Likelihood Estimate
Smoothing ML estimate
This document will have P(“Rain”|D) as 0, thus smoothing is required.
Query Modeling
P(t|Q) is extremely space and thus query expansion is necessary.
This document does not have words “Rain” and “Forest” but have related words such as “Wild Life”. Expansion of query brings different “aspects” of the topic.
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Experimental Setup
• CSIRO Enterprise Research Collection (CERC), a crawl of *.csiro.au web site conducted in March 2007.
• 370,715 documents
• Size of 4.2 gigabytes
• 50 topics
• Judgments made in 3-point scale: 2: highly relevant “key reference”1: candidate key page0: not a “key reference”
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Maximizing Average Precision (MAX_AP)
Maximizing Query
Log Likelihood (MAX_QLL
)
Best Empirical estimate (EMP_BES
T)
Parameter Estimation
Maximizing Average Precision (MAX_AP)
Maximizing Query Log likelihood (MAX_QLL)
Best Empirical Estimate (EMP_BEST)
•
•
•
Evaluation
•Maximum AP score is reached when weight is 0.6
•MAX_QLL performs slightly better than MAX_AP
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Feedback Using
Relevance Models
Relevance Models from Sample Documents
Query Model from Sample Documents
Query Representation
• Combination of expanded query terms is performed with the original query terms.• This prevents the topic to shift away from the original user information need.
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Feedback Using
Relevance Models
Relevance Models from Sample Documents
Query Model from Sample Documents
Feedback Using Relevance Models
Joint Probability of observing t together with query terms q1,q2…qk divided by joint probability of the query terms.
• RM1: It is assumed that t and qi are sampled independently and identically to each other
• RM2 : Sampling of q1,q2…qk are dependent on t but independent of each other.
RM1
Assume weight of smoothing is 0.“wild” appears 5 times in this document.“rain” appears 20 times in this document.“forest” appears 30 times in this document.Number of unique terms in this document are 150.M is just this single document.P(D1) = 1/5P(“wild”, “rain”, “forest”) = 1/5* 5/150 * 20/150 * 30/150
RM2
Given the term “wild” we first pick a document from M set with probability P(D|t) and then sample query words from the document.
Assume P(D | “wild”) is 0.7This document has 10 “rain” wordsAnd 20 “forest” wordsDocument has 200 unique wordsP(“wild”) is 0.2And M is just this documentP(“wild”, “rain”, “forest”)= 0.2* 0.7 * 20/200 * 10/200
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Feedback Using
Relevance Models
Relevance Models from Sample Documents
Query Model from Sample Documents
Relevance Models from Sample Documents
• Apply Relevance Models on Sample Document instead of Feedback documents i.e. set M = S.
• For RM1 assume P(D) = 1/|S|.
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Feedback Using
Relevance Models
Relevance Models from Sample Documents
Query Model from Sample Documents
Query Model from Sample Documents
Top K terms with highest probability P(t|S) are taken and used to formulate expanded query.
1. Sample Document set S2. Select document D from this set S with probability P(D|S)3. From this document, generate term t with probability P(t|D)4. Sum over all sample documents to obtain P(t|S)
Query Model from Sample Documents
• Maximum Likelihood Estimate of a term (EX-QM-ML)
• Smoothed Estimate of a term (EX-QM-SM)
• Ranking Function proposed by Ponte and Croft for unsupervised query expansion (EX-QM-EXP)
Query Model from Sample Documents
Three options for estimating P(D|S)
• Uniform: • Query-biased:
• Inverse query-biased:
Overview
Retrieval
Model
Experimental Set up
Query Representati
on
Baseline Parameter
s
Experimental
Evaluation
Expanded Query Models
Combination with Original Query
Importance of Sample Document
Topic Level Comparison
Topic Level Comparison
Sampling conditioned on query
Conclusion
• Introduced a method of sampling query expansion terms in a query-independent way, based on sample documents that reflect “aspects” of user’s information need that are not captured by the query.
• Introduced different versions of expansion term selection method, based on different term selection and document importance weighting methods and compared them against more traditional query expansion terms is a query-biased manner.
Questions/Discussion
• Every topic needs a sample document set, is this method feasible in real world domain where there are uncountable topics?
• Aspect Recall is obtained from the sample documents, aren’t we dependent on the “goodness” or the amount of different aspects covered in sample documents for obtaining a high aspect recall?
• Theoretically there is slight increase in MAP measurement as compared to BFB-RM2 (around 0.07), for a end-user will it provide any difference in user experience? Is such a small gain in MAP worth the high cost of obtaining sample documents?