Discovering Key Concepts in Verbose Queries Michael Bendersky and W. Bruce Croft University of...

Post on 17-Jan-2016

213 views 0 download

Tags:

transcript

Discovering Key Concepts in Verbose Queries

Michael Bendersky and W. Bruce Croft

University of Massachusetts

SIGIR 2008

Objective

• “Discovering Key Concepts in Verbose Queries”

Objective

• “Discovering Key Concepts in Verbose Queries”

• <num> Number 829

<title> Spanish Civil War support

<desc> Provide information on all kinds of material international support provided to either side in the Spanish Civil War

Objective

• “Discovering Key Concepts in Verbose Queries”

• <num> Number 829

<title> Spanish Civil War support

<desc> Provide information on all kinds of material international support provided to either side in the Spanish Civil War

Objective

• “Discovering Key Concepts in Verbose Queries”

• Use of key concepts?

Objective

• “Discovering Key Concepts in Verbose Queries”

• Use of key concepts?

• Combine with current IR model

Retrieval Model

• Conventional Language Model:

score(q,d) = p(q|d) = )(

),(

dp

dqp

Retrieval Model

• Conventional Language Model:

score(q,d) = p(q|d) =

• New Model:

score(q,d) = p(q|d) = =

)(

),(

dp

dqp

)(

),,(

dp

cdqpic

i)(

),(

dp

dqp

Final Retrieval Function

score(q,d) = ic

ii dcpqcpdqp )|()|()1()|(

Final Retrieval Function

score(q,d) =

Language Model

ic

ii dcpqcpdqp )|()|()1()|(

Final Retrieval Function

score(q,d) =

Key Concepts

ic

ii dcpqcpdqp )|()|()1()|(

What is a Concept?

• Noun phrase in a query

What is a Concept?

• Noun phrase in a query

• <num> Number 829

<title> Spanish Civil War support

<desc> Provide information on all kinds of material international support provided to either side in the Spanish Civil War

What is a Concept?

• Noun phrase in a query

• <num> Number 829

<title> Spanish Civil War support

<desc> Provide information on all kinds of material international support provided to either side in the Spanish Civil War

Finding ‘Key’ Concepts

• Rank concepts by p(ci|q)

Finding ‘Key’ Concepts

• Rank concepts by p(ci|q)

• Compute p(ci|q) by frequency?

• <num> Number 829

<title> Spanish Civil War support

<desc> Provide information on all kinds of material international support provided to either side in the Spanish Civil War

Finding ‘Key’ Concepts

• Approximate p(ci|q) by machine learning

• h(ci) is ci’s query-independent importance score

• p(ci|q) = h(ci) / ciq h(ci)

ci AdaBoost.M1 h(ci)

Features of a Concept

• is_cap : is capitalized• tf : in corpus• idf : in corpus• ridf : idf modified by Poisson model• wig : weighted information gain; change in entro

py from corpus to retrieved data• g_tf : Google term frequency• qp : number of times the concept appears as a

part of a query in MSN Live• qe : number of times the concept appears as ex

act query in MSN Live

TREC Corpus

Exp 1: Identifying Key Concept

• Cross-validation on corpus

• Each fold has 50 queries

• Check whether the top concept is a key concept

• Assume 1 key concept per query during annotation

Exp 1: Identifying Key Concept

Exp 1: Identifying Key Concept

• Better than idf ranking

Exp 2: Information Retrieval

score(q,d) =

• Use only the top 2 concepts for each query

• q is the entire <desc> section = 0.8

ic

ii dcpqcpdqp )|()|()1()|(

Exp 2: Information Retrieval

• KeyConcept[2]<desc> : author’s method

• SeqDep<desc> : include all bigrams in query

Exp 2: Information Retrieval

What to take home?

• Singling out key concepts improves retrieval