+ All Categories
Home > Documents > Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno...

Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno...

Date post: 21-Jan-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
30
Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R¨ ub Bauhaus-Universit¨ at Weimar [email protected] CIKM 2011 Glasgow, Scotland October 25, 2011 Hagen, Stein, R¨ ub Query Session Detection as a Cascade 1
Transcript
Page 1: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

Query Session Detection as a Cascade

Matthias Hagen Benno Stein Tino Rub

Bauhaus-Universitat [email protected]

CIKM 2011Glasgow, ScotlandOctober 25, 2011

Hagen, Stein, Rub Query Session Detection as a Cascade 1

Page 2: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

It’s quiz time!

What is the user searching?

paris hilton

Hagen, Stein, Rub Query Session Detection as a Cascade 2

Page 3: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

It’s quiz time!

What is the user searching?

paris hilton

Hagen, Stein, Rub Query Session Detection as a Cascade 2

Page 4: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

Without context . . .

paris hilton

source: [http://upload.wikimedia.org/wikipedia/commons/2/26/Paris Hilton 3 Crop.jpg]

Hagen, Stein, Rub Query Session Detection as a Cascade 3

Page 5: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

What if you knew the previous queries?

paris hotelsparis marriottparis hyattparis hilton

sources: [http://www.alison-anderson.com/wp-content/uploads/hilton hotel paris 2.jpg][http://maps.google.com/][http://upload.wikimedia.org/wikipedia/en/e/eb/HI mk logo hiltonbrandlogo.jpg]

Hagen, Stein, Rub Query Session Detection as a Cascade 4

Page 6: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

What if you knew the previous queries?

paris hotelsparis marriottparis hyattparis hilton

sources: [http://www.alison-anderson.com/wp-content/uploads/hilton hotel paris 2.jpg][http://maps.google.com/][http://upload.wikimedia.org/wikipedia/en/e/eb/HI mk logo hiltonbrandlogo.jpg]

Hagen, Stein, Rub Query Session Detection as a Cascade 4

Page 7: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

Query sessions: same information need

The benefits

Improved understanding of user intent

Improved retrieval performance via session knowledge

The “minor” issue

Users do not announce when querying for a new information need.

Hagen, Stein, Rub Query Session Detection as a Cascade 5

Page 8: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

Query sessions: same information need

The benefits

Improved understanding of user intent

Improved retrieval performance via session knowledge

The “minor” issue

Users do not announce when querying for a new information need.

Hagen, Stein, Rub Query Session Detection as a Cascade 5

Page 9: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

A typical query log

User Query Click domain + Click rank Time

42 istanbul en.wikipedia.org 1 2011-10-22 20:34:1742 istanbul archeology 2011-10-23 12:02:5442 istanbul archeology www.turizm.tr 6 2011-10-23 12:03:1542 istanbul archeology www.arkeoloji.tr 13 2011-10-23 18:24:0742 constantinople 2011-10-23 19:12:4042 constantinople en.wikipedia.org 4 2011-10-23 19:13:0242 soccr glasgo 2011-10-23 19:16:0142 soccer glasgow 2011-10-23 19:16:1142 soccer glasgow www.soccer.uk 3 2011-10-23 19:16:1542 celtics vs rangers 2011-10-23 20:33:0442 celtics vs rangers en.wikipedia.org 5 2011-10-23 20:33:1242 old firm 2011-10-23 22:42:48

Hagen, Stein, Rub Query Session Detection as a Cascade 6

Page 10: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

How to determine the break points?

User Query Click domain + Click rank Time

42 istanbul en.wikipedia.org 1 2011-10-22 20:34:1742 istanbul archeology 2011-10-23 12:02:5442 istanbul archeology www.turizm.tr 6 2011-10-23 12:03:1542 istanbul archeology www.arkeoloji.tr 13 2011-10-23 18:24:0742 constantinople 2011-10-23 19:12:4042 constantinople en.wikipedia.org 4 2011-10-23 19:13:02

— — — — — — — — — — — — — — — — — —

42 soccr glasgo 2011-10-23 19:16:0142 soccer glasgow 2011-10-23 19:16:1142 soccer glasgow www.soccer.uk 3 2011-10-23 19:16:1542 celtics vs rangers 2011-10-23 20:33:0442 celtics vs rangers en.wikipedia.org 5 2011-10-23 20:33:1242 old firm 2011-10-23 22:42:48

Hagen, Stein, Rub Query Session Detection as a Cascade 7

Page 11: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

The key is . . .

Automatic query session detection

Hagen, Stein, Rub Query Session Detection as a Cascade 8

Page 12: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

Automatic query session detection

Usual “technique”

Check for consecutive queries whether same/new information need.

Example

42 istanbul 2011-10-22 20:34:17 X same42 istanbul archeology 2011-10-23 18:24:07 X same42 constantinople 2011-10-23 19:12:40

— — — — — — — — — � new

42 soccer glasgow 2011-10-23 19:16:11

Hagen, Stein, Rub Query Session Detection as a Cascade 9

Page 13: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

Typical features

Temporal thresholds 5 minutes [Silverstein et al., 1999]

10–15 minutes [He and Goker, 2000]

30 minutes [Downey et al., 2007]

user specific [Murray et al., 2006]

Lexical similarity n-gram overlap [Zhang and Moffat, 2006]

Levenshtein distance [Jones and Klinkner, 2008]

Semantic similarity Search results [Radlinski and Joachims, 2005]

ESA [Lucchese et al., 2011]

Hagen, Stein, Rub Query Session Detection as a Cascade 10

Page 14: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

Previous methods

Feature combinations

More accurate than single features

One of the best: Geometric method (time + lexical) [Gayo-Avello, 2009]

Shortcomings

All features evaluated simultaneously → runtime

Geometric method ignores semantics → accuracy

Examples

Subset test suffices

soccer X samesoccer glasgow

Geometric method fails

celtics vs rangers X sameold firm

Hagen, Stein, Rub Query Session Detection as a Cascade 11

Page 15: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

Previous methods

Feature combinations

More accurate than single features

One of the best: Geometric method (time + lexical) [Gayo-Avello, 2009]

Shortcomings

All features evaluated simultaneously → runtime

Geometric method ignores semantics → accuracy

Examples

Subset test suffices

soccer X samesoccer glasgow

Geometric method fails

celtics vs rangers X sameold firm

Hagen, Stein, Rub Query Session Detection as a Cascade 11

Page 16: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

We address the shortcomings in a cascade . . .

source: [http://wp.ltchambon.com/wp-content/uploads/2010/09/Cascade-de-Tufs-Baume-les-messieurs-Jura.jpg]

Hagen, Stein, Rub Query Session Detection as a Cascade 12

Page 17: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

. . . well . . . a small 4-step cascade

source: [http://www.solarshop.com/solarpix/Solar Cascade 4 Tier GreenL.jpg]

Step 1: Subset test

↘Step 2: Geometric method

↘Step 3: ESA similarity

↙Step 4: Search results

Basic Idea

Increased feature cost (runtime) from step to step.Expensive features only if previous steps “unreliable.”

Hagen, Stein, Rub Query Session Detection as a Cascade 13

Page 18: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

. . . well . . . a small 4-step cascade

source: [http://www.solarshop.com/solarpix/Solar Cascade 4 Tier GreenL.jpg]

Step 1: Subset test

↘Step 2: Geometric method

↘Step 3: ESA similarity

↙Step 4: Search results

Basic Idea

Increased feature cost (runtime) from step to step.Expensive features only if previous steps “unreliable.”

Hagen, Stein, Rub Query Session Detection as a Cascade 13

Page 19: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

Step 1: Subset test

User Query Click domain + Click rank Time

42 istanbul en.wikipedia.org 1 2011-10-22 20:34:1742 istanbul archeology 2011-10-23 12:02:5442 istanbul archeology www.turizm.tr 6 2011-10-23 12:03:1542 istanbul archeology www.arkeoloji.tr 13 2011-10-23 18:24:07

— — — — — — — — — — — — — — — — — —

42 constantinople 2011-10-23 19:12:4042 constantinople en.wikipedia.org 4 2011-10-23 19:13:02

— — — — — — — — — — — — — — — — — —

42 soccr glasgo 2011-10-23 19:16:01— — — — — — — — — — — — — — — — — —

42 soccer glasgow 2011-10-23 19:16:1142 soccer glasgow www.soccer.uk 3 2011-10-23 19:16:15

— — — — — — — — — — — — — — — — — —

42 celtics vs rangers 2011-10-23 20:33:0442 celtics vs rangers en.wikipedia.org 5 2011-10-23 20:33:12

— — — — — — — — — — — — — — — — — —

42 old firm 2011-10-23 22:42:48

Hagen, Stein, Rub Query Session Detection as a Cascade 14

Page 20: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

Step 2: Geometric method [Gayo-Avello, 2009]

User Query Click domain + Click rank Time

42 istanbul en.wikipedia.org 1 2011-10-22 20:34:1742 istanbul archeology 2011-10-23 12:02:5442 istanbul archeology www.turizm.tr 6 2011-10-23 12:03:1542 istanbul archeology www.arkeoloji.tr 13 2011-10-23 18:24:07

— — — — — — — — — — — — — — — — — —

42 constantinople 2011-10-23 19:12:4042 constantinople en.wikipedia.org 4 2011-10-23 19:13:02

— — — — — — — — — — — — — — — — — —

42 soccr glasgo 2011-10-23 19:16:0142 soccer glasgow 2011-10-23 19:16:1142 soccer glasgow www.soccer.uk 3 2011-10-23 19:16:15

— — — — — — — — — — — — — — — — — —

42 celtics vs rangers 2011-10-23 20:33:0442 celtics vs rangers en.wikipedia.org 5 2011-10-23 20:33:12

— — — — — — — — — — — — — — — — — —

42 old firm 2011-10-23 22:42:48

Hagen, Stein, Rub Query Session Detection as a Cascade 15

Page 21: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

Step 3: Explicit Semantic Analysis [Gabrilovich and Markovitch, 2007]

User Query Click domain + Click rank Time

42 istanbul en.wikipedia.org 1 2011-10-22 20:34:1742 istanbul archeology 2011-10-23 12:02:5442 istanbul archeology www.turizm.tr 6 2011-10-23 12:03:1542 istanbul archeology www.arkeoloji.tr 13 2011-10-23 18:24:0742 constantinople 2011-10-23 19:12:4042 constantinople en.wikipedia.org 4 2011-10-23 19:13:02

— — — — — — — — — — — — — — — — — —

42 soccr glasgo 2011-10-23 19:16:0142 soccer glasgow 2011-10-23 19:16:1142 soccer glasgow www.soccer.uk 3 2011-10-23 19:16:1542 celtics vs rangers 2011-10-23 20:33:0442 celtics vs rangers en.wikipedia.org 5 2011-10-23 20:33:12

— — — — — — — — — — — — — — — — — —

42 old firm 2011-10-23 22:42:48

Hagen, Stein, Rub Query Session Detection as a Cascade 16

Page 22: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

Step 4: Search results

User Query Click domain + Click rank Time

42 istanbul en.wikipedia.org 1 2011-10-22 20:34:1742 istanbul archeology 2011-10-23 12:02:5442 istanbul archeology www.turizm.tr 6 2011-10-23 12:03:1542 istanbul archeology www.arkeoloji.tr 13 2011-10-23 18:24:0742 constantinople 2011-10-23 19:12:4042 constantinople en.wikipedia.org 4 2011-10-23 19:13:02

— — — — — — — — — — — — — — — — — —

42 soccr glasgo 2011-10-23 19:16:0142 soccer glasgow 2011-10-23 19:16:1142 soccer glasgow www.soccer.uk 3 2011-10-23 19:16:1542 celtics vs rangers 2011-10-23 20:33:0442 celtics vs rangers en.wikipedia.org 5 2011-10-23 20:33:1242 old firm 2011-10-23 22:42:48

Hagen, Stein, Rub Query Session Detection as a Cascade 17

Page 23: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

That’s the complete cascade

source: [http://www.solarshop.com/solarpix/Solar Cascade 4 Tier GreenL.jpg]

Step 1: Subset test

↘Step 2: Geometric method

↘Step 3: ESA similarity

↙Step 4: Search results

What about accuracy and runtime?

Hagen, Stein, Rub Query Session Detection as a Cascade 18

Page 24: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

That’s the complete cascade

source: [http://www.solarshop.com/solarpix/Solar Cascade 4 Tier GreenL.jpg]

Step 1: Subset test

↘Step 2: Geometric method

↘Step 3: ESA similarity

↙Step 4: Search results

What about accuracy and runtime?

Hagen, Stein, Rub Query Session Detection as a Cascade 18

Page 25: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

Experimental Evaluation

Accuracy on Gayo-Avello’s corpus (11 000 queries, 2.7 per session)

Precision Recall F-Measure (β = 1.5)

Geometric 0.8673 0.9431 0.9184Cascading 0.8618 0.9676 0.9328

Performance per step

decides F-Measure time factor

Step 1 40.49% 0.8303 0.08 ms 1.0Step 2 35.15% 0.9292 0.20 ms 2.5Step 3 2.05% 0.9316 0.27 ms 3.4Step 4 0.85% 0.9328 9.85 ms 123.1

Remark: Without Step 4 about 2 700 queries per second!

Hagen, Stein, Rub Query Session Detection as a Cascade 19

Page 26: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

Experimental Evaluation

Accuracy on Gayo-Avello’s corpus (11 000 queries, 2.7 per session)

Precision Recall F-Measure (β = 1.5)

Geometric 0.8673 0.9431 0.9184Cascading 0.8618 0.9676 0.9328

Performance per step

decides F-Measure time factor

Step 1 40.49% 0.8303 0.08 ms 1.0Step 2 35.15% 0.9292 0.20 ms 2.5Step 3 2.05% 0.9316 0.27 ms 3.4Step 4 0.85% 0.9328 9.85 ms 123.1

Remark: Without Step 4 about 2 700 queries per second!

Hagen, Stein, Rub Query Session Detection as a Cascade 19

Page 27: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

Almost the end: The take-away messages!

Hagen, Stein, Rub Query Session Detection as a Cascade 20

Page 28: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

What we have done

Results

Cascading method

Cheap features first

Beats geometric

3 step version: simple, fast,high quality sessions

Future Work

Postprocessing for multi-tasking

Postprocessing for goals/missions

Thank you,

Hagen, Stein, Rub Query Session Detection as a Cascade 21

Page 29: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

What we have (not) done

Results

Cascading method

Cheap features first

Beats geometric

3 step version: simple, fast,high quality sessions

Future Work

Postprocessing for multi-tasking

Postprocessing for goals/missions

Thank you,

Hagen, Stein, Rub Query Session Detection as a Cascade 21

Page 30: Query Session Detection as a Cascade · Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011

What we have (not) done

Results

Cascading method

Cheap features first

Beats geometric

3 step version: simple, fast,high quality sessions

Future Work

Postprocessing for multi-tasking

Postprocessing for goals/missions

Thank you,

Hagen, Stein, Rub Query Session Detection as a Cascade 21


Recommended