Post on 17-Jan-2016
transcript
Discovering Key Concepts in Verbose Queries
Michael Bendersky and W. Bruce Croft
University of Massachusetts
SIGIR 2008
Objective
• “Discovering Key Concepts in Verbose Queries”
Objective
• “Discovering Key Concepts in Verbose Queries”
• <num> Number 829
<title> Spanish Civil War support
<desc> Provide information on all kinds of material international support provided to either side in the Spanish Civil War
Objective
• “Discovering Key Concepts in Verbose Queries”
• <num> Number 829
<title> Spanish Civil War support
<desc> Provide information on all kinds of material international support provided to either side in the Spanish Civil War
Objective
• “Discovering Key Concepts in Verbose Queries”
• Use of key concepts?
Objective
• “Discovering Key Concepts in Verbose Queries”
• Use of key concepts?
• Combine with current IR model
Retrieval Model
• Conventional Language Model:
score(q,d) = p(q|d) = )(
),(
dp
dqp
Retrieval Model
• Conventional Language Model:
score(q,d) = p(q|d) =
• New Model:
score(q,d) = p(q|d) = =
)(
),(
dp
dqp
)(
),,(
dp
cdqpic
i)(
),(
dp
dqp
Final Retrieval Function
score(q,d) = ic
ii dcpqcpdqp )|()|()1()|(
Final Retrieval Function
score(q,d) =
Language Model
ic
ii dcpqcpdqp )|()|()1()|(
Final Retrieval Function
score(q,d) =
Key Concepts
ic
ii dcpqcpdqp )|()|()1()|(
What is a Concept?
• Noun phrase in a query
What is a Concept?
• Noun phrase in a query
• <num> Number 829
<title> Spanish Civil War support
<desc> Provide information on all kinds of material international support provided to either side in the Spanish Civil War
What is a Concept?
• Noun phrase in a query
• <num> Number 829
<title> Spanish Civil War support
<desc> Provide information on all kinds of material international support provided to either side in the Spanish Civil War
Finding ‘Key’ Concepts
• Rank concepts by p(ci|q)
Finding ‘Key’ Concepts
• Rank concepts by p(ci|q)
• Compute p(ci|q) by frequency?
• <num> Number 829
<title> Spanish Civil War support
<desc> Provide information on all kinds of material international support provided to either side in the Spanish Civil War
Finding ‘Key’ Concepts
• Approximate p(ci|q) by machine learning
• h(ci) is ci’s query-independent importance score
• p(ci|q) = h(ci) / ciq h(ci)
ci AdaBoost.M1 h(ci)
Features of a Concept
• is_cap : is capitalized• tf : in corpus• idf : in corpus• ridf : idf modified by Poisson model• wig : weighted information gain; change in entro
py from corpus to retrieved data• g_tf : Google term frequency• qp : number of times the concept appears as a
part of a query in MSN Live• qe : number of times the concept appears as ex
act query in MSN Live
TREC Corpus
Exp 1: Identifying Key Concept
• Cross-validation on corpus
• Each fold has 50 queries
• Check whether the top concept is a key concept
• Assume 1 key concept per query during annotation
Exp 1: Identifying Key Concept
Exp 1: Identifying Key Concept
• Better than idf ranking
Exp 2: Information Retrieval
score(q,d) =
• Use only the top 2 concepts for each query
• q is the entire <desc> section = 0.8
ic
ii dcpqcpdqp )|()|()1()|(
Exp 2: Information Retrieval
• KeyConcept[2]<desc> : author’s method
• SeqDep<desc> : include all bigrams in query
Exp 2: Information Retrieval
What to take home?
• Singling out key concepts improves retrieval