+ All Categories
Home > Documents > ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for...

ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for...

Date post: 31-Mar-2015
Category:
Upload: quentin-moulds
View: 214 times
Download: 1 times
Share this document with a friend
Popular Tags:
24
ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang Zhai Department of Computer Science University of Illinois at Urbana- Champaign
Transcript
Page 1: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 1

Mining Term Association Patterns from

Search Logs for Effective Query

Reformulation

Xuanhui Wang and ChengXiang Zhai

Department of Computer Science

University of Illinois at Urbana-Champaign

Page 2: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 2

Ineffective Queries

reduce space command latex

Page 3: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 3

Effective Queries

squeeze space command latex

Page 4: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 4

More Examples

• If you want to wash your vehicle

– “vehicle wash”, “auto wash”

– “car wash”, “truck wash”

• If you want to buy a car

– “auto quotes”

– “auto sale quotes”?

– “auto insurance quotes”?

Page 5: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 5

What Makes a Query Ineffective?

• Vocabulary mismatch

– “reduce space command latex” vs “squeeze space command latex”

– “auto wash” vs “car wash”

• Lack of discrimination

– “auto quotes” vs “auto sale quotes”

• …

How can we help improving ineffective queries?

Term substitution

Term addition

Page 6: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 6

Our Contribution

• We cast query reformulation as term level pattern mining from search logs

• We define two basic types of patterns at term level and propose probabilistic methods

– Context-sensitive term substitution

• “autocar | _wash”, “car auto | _trade”

– Context-sensitive term addition

• “+sale | auto_quotes”

• We evaluate our methods on commercial search engine logs and show their effectiveness

Page 7: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 7

Problem Formulation

QueryCollection

Task 1:Contextual

Models

Task 2:Translation

Models

q = auto wash

Task 3: Pattern Mining

autocar | _washautotruck | _wash

+southland | _auto wash…

Patterns

Search logs

Offline part Online part

car washtruck washsouthland auto wash

Page 8: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 8

Task 1: Contextual Models

enterprise car rental rental car budget car rental car pricing car pictures car accidents…

G: General context

• Syntagmatic relations

• Capture terms frequently co-occur with w inside queries

Sample query collection

rental: 0.375enterprise: 0.125budget: 0.125pricing: 0.125…

Model PG( * |car)

Page 9: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 9

Task 1: Contextual Models

enterprise car rental rental car budget car rental car pricing car pictures car accidents…

Model: P L1( * | car)

• Syntagmatic relations

• Capture terms frequently co-occur with w inside queries

Sample query collection

rental: 0.333enterprise: 0.333budget: 0.333…

L1: 1st Left Context

Page 10: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 10

Task 1: Contextual Models

enterprise car rental rental car budget car rental car pricing car pictures car accidents…

Model: P R1( * |w)

• Syntagmatic relations

• Capture terms frequently co-occur with w inside queries

Sample query collection

rental: 0.4pricing: 0.2pictures: 0.2accidents: 0.2 …

R1: 1st Right context

Page 11: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 11

Task 2: Translation Models

• Paradigmatic relations (“car” and “auto”)

• Capture terms that are substitutable with w

• Similar contexts high translation probability

• Translation models

Probability of generating s’s context from w’s contextual model

Size of L1 context Size of R1 context

Page 12: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 12

Task 3.1: Pattern Mining–Term Substitution

q=[w1…wi-1wiwi+1…wn]

q’=[w1…wi-1swi+1…wn]

Substitute wi by s

Which word s should be chosen?Local factor

Global factor:translation model

Page 13: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 13

Estimating Local Factor

Independence

w1…wi-1__wi+1…wn

s

)|(~

11swP

iL )|(

~11

swP iL )|(~

11swP iR )|(

~2 swP

inR … …

Ignore those terms far away

Page 14: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 14

Task 3.2: Pattern Mining–Term Addition

q=[w1…wi-1wi…wn]

q’=[w1…wi-1rwi…wn]

Adding r before wi

Similar to the Local Factor in Term Substitution Patterns

Uniform

Page 15: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 15

Evaluation: Data Preparation

• From Microsoft Live Labs

5/1/2006 5/31/20065/20/2006

History Logs Future logs

History Collection4.4M queries

1.6M are distinct1.3M user sessions

Used to construct test

cases

Page 16: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 16

Examples of Contextual Models

• Left and Right contexts are different

• General context mixed them together

Page 17: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 17

Examples of Translation Models

• Conceptually similar keywords have high translation probabilities

• Provide possibility for exploratory search in an interactive manner

Page 18: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 18

Examples of Term Substitution

• Substitution is context sensitive

• Intuitively, reworded queries are more effective

Page 19: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 19

Effectiveness Comparison of Term Substitution – Experiment Design

Q1 Q2 Qk

R21

R22

R23

Rk1

Rk2

Rk3

C3C2

C1

Session …

How well can a reformulated query rank C1, C2, and C3 on the top?

Q1reformulation Q1’

dx

C3

C1

C2

dx

Q2’ Q3’

dx

C1

dx

dx

dx

dx

C2

dx

C3

dx

…P@5 0.6 0.2 0.4

Best P@5=0.6

Page 20: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 20

Results

Our method reformulates queries more effectively

[Jones’06]

Our method

#Recommended Queries

Page 21: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 21

Term Addition Patterns

Term addition patterns can refine a broad query

Page 22: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 22

Related Work

• Query suggestions [e.g., Jones’06, Sahami et al’06]– Discover pattern at query level

– Rely on external resources or training data

– Does not consider the effectiveness

• Query modifications in IR [Rocchio’71, Anick’03]– Expand queries from returned documents

– Does not rely on search logs, mostly adding terms

• Related work in NLP community [Lin’98, Rapp’02]– Finding synonym or near synonyms

– Syntagmatic and paradigmatic relations

– Not used for query reformulation

Page 23: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 23

Conclusions and Future Work

• We propose a new way to mine search logs for patterns to address ineffective queries– Vocabulary mismatch

– Lack of discrimination

• We define and mine two basic patterns at term level– Context-sensitive term substitution patterns

– Context-sensitive term addition patterns

• Experiments show the effectiveness of our methods

• In the future, – Use relevance judgments instead of clicks

– Exploit click information for better query reformulation

Page 24: ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

ACM CIKM 2008, Oct. 26-30, Napa Valley 24

Thank You!


Recommended