Exploiting Semantics with Structured Queries

Date post: 14-Jan-2016
Exploiting Semantics with Structured Queries. Jose Ramón Pérez-Agüera & Hugo Zaragoza U. Complutense de Madrid Yahoo! Research (Barcelona). Query expansion makes term independance a big issue… we are double counting "meanings" !!!.
Hugo Zaragoza (Yahoo! Research). CLEF 2008 1 Exploiting Semantics with Structured Queries Jose Ramón Pérez-Agüera & Hugo Zaragoza U. Complutense de Madrid Yahoo! Research (Barcelona)
Exploiting Semantics with Structured Queries

Jose Ramón Pérez-Agüera & Hugo Zaragoza

U. Complutense de Madrid Yahoo! Research (Barcelona)

Query expansion makes term independance

a big issue…

we are double counting “meanings” !!!

Term independance assumption gets worse with query expansion… (example 1)

Verde que te quiero verde.

Verde viento. Verdes


El barco sobre la mar

y el caballo en la montaña.

Con la sombra en la cintura

ella sueña en su baranda

verde carne, pelo verde,


ojos de fría plata.

Bajo la luna gitana, las


la están mirando y ella

no puede mirarlas.


verde3 que te quiero


verde3 viento. verde1


El barco sobre la mar

y el caballo en la montaña.

Con la sombra en la cintura

ella sueña en su baranda

verde5 carne, pelo verde1,


ojos de fría plata.

Bajo la luna gitana, las cosas

la están mirando y ella

no puede mirarlas.

[…] q1: verde1 pelo

q2: verde1 verde2 pelo

q2: verde1 verde2 verde3 verde4 verde5 pelo

q: verde pelo [CLEF EFE94, 2001 Spanish topics]

Term independance assumption gets worse with query expansion… (example 2)

[CLEF EFE94, 2001 Spanish topics][Pérez-Agüera , Zaragoza and Araujo, NLDB 2008]

- 46% !!!

• BM25 dependance model:

tf = 1 2 3 4 … 10



Term independance assumption gets worse with query expansion… (example 3)










Query Expansion (example of state of the art)

• Term Selection:– Divergence From Randomness Expansion Model (DFR) Bo1 Model [8,6]:

• Term Weighting:– Rochio [9]:

tf in top x=1 document

top 40 terms document



• Perf. Prediction:– AvICTF [5] (cheap)

> 9.0

qt tn


qlCq 2log




Results in CLEF 2008 Robust-WSD Task:

• Standard Query Expansion:

• 3rd team in CLEF Robust out of 8. 1st team well ahead of everyone.– It seems no one improved GMAP so they reported MAP

Query expansion makes term independance

a big issue…

we are double counting “meanings” !!!

“Cheap Barcelona Italian Restaurants”{cheap, barcelona, italian, restaurant }

Expansion:{cheap, barcelona, italian, restaurant, inexpensive, affordable, Sagrada Familia, Ramblas, Gràcia, Barceloneta, pizzeria, trattoria, café }

Strcuture: collect related meanings in clauses{

{cheap, inexpensive, affordable},{Barcelona, Sagrada Familia, Ramblas, Gràcia, Barceloneta, …},{Italian_restaurant, pizzeria, trattoria, café}


Query Clauses Idea:




Clause independance, not term independance

Query Clauses Idea

term 1

term 2

term e1

Query Clauses Idea


term e1

term e4


term e2

term e3




(same idea as BM25-F on fields [10])

Query Clauses Model

Bag of words:

Query clauses :(bag of bags of weighted words):

),(*),()( 21 CtWlttfWdscoreqt


},...,,{ 10 qtttq

)},()...,,(),,{( 01100 wtwtwtc c

},...,,{ 10 qcccq

),(*,)()( 21 CcWlwttfWdscoreqc



Matrix notation: let , then redefine each document as


clause term frequency:

clause collection frequency:

clause document likelihood:

clause collection lihelihood:

In general projection is query-dependent and needs to be done online:

Query Clauses Implementation of W1 and W2

Query Clauses Implementation of W1 and W2

IDF is not straight-forward, there are several possibilities:

Some possibilities:– min, max, avg (leads to inconsistent situations for small weights)– expected clause idf:

)},()...,,(),,{( 01100 wtwtwtc c )(...,),(),( 10 ctidftidftidf








How can we construct the clauses?

• Idea: use WordNet to expand each term in the query as a clause.

• Idea: use statistical methods to expand each term in the query.

• Idea: use query expansion to find terms, use statistical methods to group the, into clauses.

• Idea: use query expansion to find terms, use WordNet to group them into clauses. – There exist several semantic similarity measures based on WordNet [11]:

WN(s1,s2) – We construct a clause for every original query term, and we add to it expanded

terms with:WN(s1,s2) < k

– To be conservative, all terms not in an original clause are added together to a new “Other” clause.

• Implementation:

DFR Expansion: 40 new terms extracted for each query.

Query Clauses:

Ranking: BM25 with standard params, on clauses:

WordNet Similarity

DFR)},()...,,(),,{( 01100 wtwtwtc c

},...,,{ 10 qcccq

Results in CLEF 2008 Robust-WSD Task:



d wttfctf )(

Query Clauses

Results in CLEF 2008 Robust-WSD Task:

4% rel. impr.

(overall results)

• 2nd team in CLEF Robust, 1st team well ahead without use of WSD.


[10] H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC 13: Web and hard tracks. In Text REtrieval Conference (TREC-13), 2004.

[11] Z. Wu and M. Palmer, Verb semantics and lexical selection, 32nd. Annual Meeting of the Association for Computational Linguistics, ACL 1991.
