Reverted Indexing for Expansion and Feedback

Post on 10-May-2015

996 views 0 download

Tags:

description

Pickens, J., Cooper, M., and Golovchinsky, G. Reverted Indexing for Expansion and Feedback. In Proc. CIKM 2010, Toronto, Canada, ACM Press. See http://fxpal.com/?p=abstract&abstractID=581

transcript

Reverted Indexing for Feedback and Expansion

Jeremy Pickens, Matthew Cooper,

Gene Golovchinsky

Reverted Indexing for Feedback and Expansion

Jeremy PickensCatalyst Repository Systems

Query-Document Duality has long history

• Using queries to label documents

• Queries and documents as bipartite graph– Used for random walks– Used for partitioning

• Reverse Querying

Motivation – Three R’s

Retrievability

Reuse (Algorithmic)

Recall-Oriented Tasks

Our Key Contribution

We treat query result sets as unstructured text “documents” -- and index them

Outline

• Reverted Documents• Reverted Indexing• Experimental Setup• Results

– Effectiveness– Efficiency

• Related Work• Future Extensions

Reverted Document

Query Expression

Ranking Algorithm

Results (docid)

Results (score)

ID(Basis Query)

Body

Basis Query(Reverted Document ID)

Query Expression

RankingAlgorithm

giraffe BM25

cheetah BM25

gazelle BM25

gazelle Language Model

gazelle PL2 (Divergence from Randomness)

gazelle Y

gazelle B

gazelle G

fast cheetah BM25

cheetah AND NOT gazelle Boolean

Latitude+Longitude of Zanzibar Euclidean distance

Reverted Document Body

Results (docid)

Results (score)

Canonical URL and/or docid

1. Probability of Relevance2. Cosine similarity3. KL Divergence4. Raw Rank5. 1 or 0 (Boolean)

rank docid score shift-scale Ahn&Moffat

1 #415 0.82 10.0 10

2 #32 0.73 8.92 9

3 #63 0.62 7.57 8

4 #7 0.49 5.95 6

5 #56 0.35 4.24 4

6 #12 0.14 1.72 2

7 #108 0.12 1.36 1

8 #115 0.09 1.09 1

9 #42 0.08 1.0 1

10 #85 0.08 1.0 1

Result Set→Document Body

Result Set→Document Bodydocid Ahn&Moffat

#415 10

#32 9

#63 8

#7 6

#56 4

#12 2

#108 1

#115 1

#42 1

#85 1

<text>415 415 415 415 415 415 415 415 415 415 32 32 32 32 32 32 32 32 32 63 63 63 63 63 63 63 63 7 7 7 7 7 7 56 56 56 56 12 12 108 115 42 85</text>

Reverted Document

Query Expression

Ranking Algorithm

Results (docid)

Results (score)

ID(Basis Query)

Body

Reverted Document<document><docid>[gazelle : BM25]</docid><text>415 415 415 415 415 415 415 415 415 415 32 32 32 32 32 32 32 32 32 63 63 63 63 63 63 63 63 7 7 7 7 7 7 56 56 56 56 12 12 108 115 42 85</text></document>

Fin

Questions?

Outline

• Reverted Documents• Reverted Indexing• Experimental Setup• Results

– Effectiveness– Efficiency

• Related Work• Future Extensions

Reverted Indexing

1. Choose a set of basis queries

2. For each basis query:1. Execute each query, producing results up to

cutoff depth k

2. Use results to create a “reverted document”

3. Add the reverted document to the index

How basis queries are chosen (in these experiments): All singleton terms (unigrams) with df ≥ 2. Ranking algorithm for all basis queries is PL2.

Standard Index

Reverted Index

Reverted Index Statistics

Retrieval Score of docid Term Frequency

Sum of Retrieval Scores of all docids retrieved by

a Basis Query

Document Length

Number of Basis Queries that docid was

retrieved by

Document Frequency

Outline

• Reverted Documents• Reverted Indexing• Experimental Setup• Results

– Effectiveness– Efficiency

• Related Work• Future Extensions

Experiment: Relevance Feedback

1. Run initial query using PL2 (Terrier platform)[poaching wildlife preserves]

2. Judge top k documents for relevance

3.

4. Expand using top 500 terms (strongest baseline @ 500)

5. Run expanded query using PL2

6. Evaluate

Use KL Divergence to select and weight query expansion terms

Use Bo1 to select and weight query expansion terms

Use PL2 retrieval on the Reverted Index to select and weight query expansion terms

Reverted Index→Expansion1. Original query = [poaching wildlife preserves]

2. Reverted query = [#415 #56 #42 #85]

3. Expanded query = [poaching^2.0 wildlife^1.24 preserves^1.0 poachers^0.57 tsavo^0.56 leakey^0.41 tusks^0.39 …]

term original retrieved weightpoaching 1 1.0 2.0poachers 0 0.57 0.57

tsavo 0 0.56 0.56leakey 0 0.41 0.41tusks 0 0.39 0.39

elephants 0 0.34 0.34wildlife 1 0.24 1.24

kws 0 0.2 0.2… … … …

preserves 1 0 1.0

Outline

• Reverted Documents• Reverted Indexing• Experimental Setup• Results

– Effectiveness– Efficiency

• Related Work• Future Extensions

MAP

%Change

Residual MAP

%Change

Efficiency

• Two components to query expansion– Selection and Weighting– Execution of Expanded Query

Avg Selection Time

Avg Execution Time

Why would execution be faster?

Bo1 Reverted_PL2Term Score Term Score

leakey 0.88 poaching 1.00poaching 0.74 poachers 0.56wildlife 0.73 tsavo 0.56kenya 0.52 leakey 0.41ivory 0.47 tusks 0.39elephants 0.46 elephants 0.34elephant 0.32 wildlife 0.24deer 0.30 kws 0.20poachers 0.28 kez 0.17conservation 0.27 ivory 0.14species 0.23 jealousies 0.14tusks 0.19 elephant 0.14african 0.19 conservationists 0.09namibia 0.19 kenya 0.09animals 0.17 fiefdom 0.08africa 0.15 safaris 0.04zimbabwe 0.15 conservationist 0.03tsavo 0.14 egos 0.01kenyan 0.13 kierie 0.00conservationists 0.12 aphrodisiacs 0.00

Bo1 Reverted_PL2Term DF Term DF

africa 20390 wildlife 2891african 10636 kenya 1163conservation 4298 ivory 1014animals 3928 elephant 743species 3479 elephants 356wildlife 2891 poaching 331kenya 1163 conservationists 293ivory 1014 egos 269zimbabwe 966 kez 173deer 748 fiefdom 129elephant 743 conservationist 125namibia 483 poachers 117kenyan 436 safaris 57elephants 356 jealousies 56poaching 331 tusks 42conservationists 293 leakey 22poachers 117 tsavo 12tusks 42 aphrodisiacs 12leakey 22 kws 9tsavo 12 kierie 2Average DF 2617 Average DF 391

Bo1 Reverted_PL2Term DF Term DF

los 46748 transportation 15262angeles 45147 freeway 3506metro 39849 tunnel 2643safety 22569 disasters 1822fire 21257 subway 805foot 13120 extinguished 452traffic 12410 rtd 227feet 12034 caved 193hollywood 7677 shoring 158heat 6004 roper 147rail 5747 timbers 98downtown 5390 shored 97engineers 4308 pilgrimages 73freeway 3506 asphyxiation 71disasters 1822 smolder 29firefighters 1489 busway 22subway 805 grouting 21rtd 227 smoldered 19timbers 98 lutgen 10busway 22 droped 2Average DF 12511 Average DF 1283

Outline

• Reverted Documents• Reverted Indexing• Experimental Setup• Results

– Effectiveness– Efficiency

• Related Work• Future Extensions

Related Work

Inspiration:

“Retrievability: An Evaluation Measure for Higher Order Information Access Tasks” --Azzopardi and Vinay, CIKM 2008

Azzopardi & Vinay take a document centric approach, examining whether documents (n)ever appear among top k results to any query

Related Work

Query-Document Duality has long history– S. E. Robertson. “Query-Document Symmetry

and Dual models.” Journal of Documentation, 50(3),1994

– B. Billerbeck, F. Scholer, H. E. Williams, and J. Zobel. Query Expansion Using Associated Queries. CIKM '03

– N. Craswell and M. Szummer. Random walks on the Query-Click Graph. SIGIR 2007

– Reverse Querying / alerting (various)

Future ExtensionsBasis queries

– Query expression may be arbitrarily complex– Ranking function may be arbitrarily complex

(remember: ranking function is a part of the basis query)

Reverted queries– Best Match: [#415 #56 #42 #85]– Boolean: (#415 AND #56) OR (#42 AND #85)– Other query operators:

[SYNONYM(#415 #56) #42 #85]

[ORDERED(#415 #56) #42 #85]

Motivation – Three R’s

Retrievability

Reuse (Algorithmic)

Recall-Oriented Tasks

Fin

Questions?