+ All Categories
Home > Documents > What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search?...

What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search?...

Date post: 20-Mar-2021
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
75
IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based Metrics Summary What Would We Like IR Metrics to Measure? Alistair Moffat with (most recently) thanks to: Peter Bailey, Falk Scholer, Paul Thomas NTCIR 2016
Transcript
Page 1: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

What Would We LikeIR Metrics to Measure?

Alistair Moffat

with (most recently) thanks to:Peter Bailey, Falk Scholer, Paul Thomas

NTCIR 2016

Page 2: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Overview

Metrics Galore

Why Do Users Search?

Audience Exercise (you will need your smartphone)

A Model for User Search Behavior

Query Variation in Action

Recall-Based Metrics

Summary

Page 3: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

The Library Catalog

Page 4: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

The Library Catalog

Page 5: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Online Search

During the 1970s and 1980s, online “search” was describedby Boolean expressions, was guarded by librarians, and costreal money (international phone call to US at ≈ $5/minute).

Students and staff were allowed one search session per year,typically 30 mins of query formulation and trial-and-error,seeking out a “goldilocks” query, determined primarily byanswer set size.

Then a “print abstracts” command, and a ten-day wait forairmail from the States. Then library interloan requests.

Page 6: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

How Was Search Measured?

To measure search quality, binary relevance labels wereassumed.

Precision indicated how satisfied the user was (or, shouldhave been) with what they were given; recall indicated howdisappointed the user would be if they could somehow knowabout the documents they had missed.

Plus, could combine into F1 by taking their harmonic mean.

Page 7: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Enter the 1990s

Ranked retrieval emerged as a serious tool in the late 1980s;the first TREC collaboration was in 1992.

Computers and implementation techniques reached the stagewhere skilled amateurs (that is, academics) could indexhundreds of megabytes or even (gasp) small numbers ofgigabytes.

Page 8: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Enter the 1990s (1994)

Page 9: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Measuring Ranked Lists

Now there were problems:

I no longer an answer set being generated, and

I no longer possible to just “know” what the correctanswers should have been.

On the assumption rankings are (always?) truncated at depthk , could measure using precision@k and (maybe?) recall@k.

A bigger problem is that human nature is to seek just asingle number. Fastest. Tallest. Richest.

Best retrieval effectiveness...

Page 10: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Measuring Ranked Lists

Combinations were developed:

I 3-point average precision

I 11-point average precision (interpolated or not)

I (all-relevant-points) average precision (AP).

Plus shallower measures with a lighter judgment load:

I reciprocal rank (RR)

In TREC experimentation through the 1990s AP emerged asthe main metric used to compare systems.

Page 11: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Enter the 2000s

(1999)

By the late 1990s, freecommercial web search wasa rapidly growing industry.

And academics werecomfortable working withmulti-gigabyte text files.

But even so...

Page 12: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Enter the 2000s (1999)

By the late 1990s, freecommercial web search wasa rapidly growing industry.

And academics werecomfortable working withmulti-gigabyte text files.

But even so...

Page 13: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Enter the 2000s

More metrics arrived:

I 2002: discounted cumulative gain (DCG),normalized discounted cumulative gain (NDCG)[Jarvelin & Kekalainen]

I 2004: BPref [Buckley & Voorhees]

I 2008: Q-Measure [Sakai & Kando]

I 2008: Rank-biased precision (RBP) [Moffat & Zobel]

I 2009: Expected reciprocal rank (ERR) [Chapelle et al.]

I 2012: Time-biased gain (TBG) [Smucker & Clarke]

Plus variants for: faceted retrieval; inferred relevance basedon sampling; etc.

Page 14: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Why Do Users Search?

Economics says people act when they can exchange effort forutility; and that if they have a choice of alternatives and allother factors are equal, they will favor the option with thebest conversion rate.

For search, utility is measured as relevance, or gain; possiblyfractional, possibly context dependent, and possibly personal.

Effort is measured in seconds or minutes (or perhapsbrain-Watts); or approximated by surrogate units calleddocuments inspected.

Page 15: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Why Do Users Search?

If effort can be represented by documents inspected,

and if all other things are equal,

then users will prefer the search service with the greatestexpected gain per document inspected.

Because that is the conversion rate between effort and utility.

Page 16: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Audience Exercise (1)

Get a notepad on your smartphone ready (or find a pen!)...

While visiting Thailand for a beach holiday last year, youdecided to visit some local museums to learn more aboutThailands history. You learned many interesting things aboutthe country, including that it was not always called Thailand.What was it called originally?

(a) How many useful web pages do you think you wouldneed to complete the search task?(b) How many different queries do you think you would needto enter to find that many useful pages?(c) What would your first query be?

Page 17: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Audience Exercise (1)

Get a notepad on your smartphone ready (or find a pen!)...

While visiting Thailand for a beach holiday last year, youdecided to visit some local museums to learn more aboutThailands history. You learned many interesting things aboutthe country, including that it was not always called Thailand.What was it called originally?

(a) How many useful web pages do you think you wouldneed to complete the search task?(b) How many different queries do you think you would needto enter to find that many useful pages?(c) What would your first query be?

Page 18: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Audience Exercise (1)

Get a notepad on your smartphone ready (or find a pen!)...

While visiting Thailand for a beach holiday last year, youdecided to visit some local museums to learn more aboutThailands history. You learned many interesting things aboutthe country, including that it was not always called Thailand.What was it called originally?

(a) How many useful web pages do you think you wouldneed to complete the search task?

(b) How many different queries do you think you would needto enter to find that many useful pages?(c) What would your first query be?

Page 19: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Audience Exercise (1)

Get a notepad on your smartphone ready (or find a pen!)...

While visiting Thailand for a beach holiday last year, youdecided to visit some local museums to learn more aboutThailands history. You learned many interesting things aboutthe country, including that it was not always called Thailand.What was it called originally?

(a) How many useful web pages do you think you wouldneed to complete the search task?

(b) How many different queries do you think you would needto enter to find that many useful pages?

(c) What would your first query be?

Page 20: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Audience Exercise (1)

Get a notepad on your smartphone ready (or find a pen!)...

While visiting Thailand for a beach holiday last year, youdecided to visit some local museums to learn more aboutThailands history. You learned many interesting things aboutthe country, including that it was not always called Thailand.What was it called originally?

(a) How many useful web pages do you think you wouldneed to complete the search task?(b) How many different queries do you think you would needto enter to find that many useful pages?

(c) What would your first query be?

Page 21: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Audience Exercise (2)

You recently heard a commercial about the health benefitsof eating algae, seaweed and kelp. This made you interestedin finding out about the positive uses of marine vegetation,both as a source of food, and as a potentially useful drug.

(a) How many useful web pages do you think you wouldneed to complete the search task?(b) How many different queries do you think you would needto enter to find that many useful pages?(c) What would your first query be?

Page 22: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Audience Exercise (2)

You recently heard a commercial about the health benefitsof eating algae, seaweed and kelp. This made you interestedin finding out about the positive uses of marine vegetation,both as a source of food, and as a potentially useful drug.

(a) How many useful web pages do you think you wouldneed to complete the search task?

(b) How many different queries do you think you would needto enter to find that many useful pages?(c) What would your first query be?

Page 23: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Audience Exercise (2)

You recently heard a commercial about the health benefitsof eating algae, seaweed and kelp. This made you interestedin finding out about the positive uses of marine vegetation,both as a source of food, and as a potentially useful drug.

(a) How many useful web pages do you think you wouldneed to complete the search task?

(b) How many different queries do you think you would needto enter to find that many useful pages?

(c) What would your first query be?

Page 24: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Audience Exercise (2)

You recently heard a commercial about the health benefitsof eating algae, seaweed and kelp. This made you interestedin finding out about the positive uses of marine vegetation,both as a source of food, and as a potentially useful drug.

(a) How many useful web pages do you think you wouldneed to complete the search task?(b) How many different queries do you think you would needto enter to find that many useful pages?

(c) What would your first query be?

Page 25: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Audience Exercise

The point?

For (a) useful documents, put up your hand if you had moredocuments expected for the “history of thailand” scenariothan for the “marine vegetation” one.

For (b) number of queries, put up your hand if you had morequeries expected for the “history of thailand” scenario thanfor the “marine vegetation” one.

Show your two (c) first queries to the person sitting next toyou. Both of you put up your hand if both queries wereexactly the same.

Page 26: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Audience Exercise

The point?

For (a) useful documents, put up your hand if you had moredocuments expected for the “history of thailand” scenariothan for the “marine vegetation” one.

For (b) number of queries, put up your hand if you had morequeries expected for the “history of thailand” scenario thanfor the “marine vegetation” one.

Show your two (c) first queries to the person sitting next toyou. Both of you put up your hand if both queries wereexactly the same.

Page 27: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Audience Exercise

The point?

For (a) useful documents, put up your hand if you had moredocuments expected for the “history of thailand” scenariothan for the “marine vegetation” one.

For (b) number of queries, put up your hand if you had morequeries expected for the “history of thailand” scenario thanfor the “marine vegetation” one.

Show your two (c) first queries to the person sitting next toyou. Both of you put up your hand if both queries wereexactly the same.

Page 28: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Audience Exercise

The point?

For (a) useful documents, put up your hand if you had moredocuments expected for the “history of thailand” scenariothan for the “marine vegetation” one.

For (b) number of queries, put up your hand if you had morequeries expected for the “history of thailand” scenario thanfor the “marine vegetation” one.

Show your two (c) first queries to the person sitting next toyou. Both of you put up your hand if both queries wereexactly the same.

Page 29: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

First Queries, 42 Crowd-Workers

history of thailand (×2), original name of thailand (×2), thailand

(×3), thailand first name, thailand former name, thailand name,

thailand original name (×3), thailand s history (×3), thailand s

original name, thailand wiki (×2), thailands first name, thailands

former name, what thailand was called originally, what was

thailand called, what was thailand called originally (×9), what was

thailand originally called (×6), what was thailand s original name,

what was thailands original name (×2), what was the original

name of thailand

Page 30: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

First Queries, 47 Crowd-Workers

algae health benefits, algae seaweed kelp nutrition medicine,

benefits of eating algae seaweed and kelp, benefits of marine

vegetables, benefits to eating algae seaweed and kelp, different

application of marine vegetation, edible seaweeds, finding out

about the positive uses of marine vegetation, health benefits,

health benefits of algae seaweed and kelp, health benefits of

marine vegetation, health benefits of seaweed algae kelp food

supply medical benefits, is sea veggies really good for you,

marine vegetation, marine vegetation algae seaweed kelp, marine

vegetation as food or drugs, marine vegetation benefits, marine

vegetation food and a drug use, marine vegetation food and drugs,

marine vegetation good for health, marine vegetation health

benefits, marine vegetation positive effects, . . .

Page 31: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

First Queries, 47 Crowd-Workers

algae health benefits, algae seaweed kelp nutrition medicine,

benefits of eating algae seaweed and kelp, benefits of marine

vegetables, benefits to eating algae seaweed and kelp, different

application of marine vegetation, edible seaweeds, finding out

about the positive uses of marine vegetation, health benefits,

health benefits of algae seaweed and kelp, health benefits of

marine vegetation, health benefits of seaweed algae kelp food

supply medical benefits, is sea veggies really good for you,

marine vegetation, marine vegetation algae seaweed kelp, marine

vegetation as food or drugs, marine vegetation benefits, marine

vegetation food and a drug use, marine vegetation food and drugs,

marine vegetation good for health, marine vegetation health

benefits, marine vegetation positive effects, . . .

Page 32: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

First Queries, 47 Crowd-Workers

algae health benefits, algae seaweed kelp nutrition medicine,

benefits of eating algae seaweed and kelp, benefits of marine

vegetables, benefits to eating algae seaweed and kelp, different

application of marine vegetation, edible seaweeds, finding out

about the positive uses of marine vegetation, health benefits,

health benefits of algae seaweed and kelp, health benefits of

marine vegetation, health benefits of seaweed algae kelp food

supply medical benefits, is sea veggies really good for you,

marine vegetation, marine vegetation algae seaweed kelp, marine

vegetation as food or drugs, marine vegetation benefits, marine

vegetation food and a drug use, marine vegetation food and drugs,

marine vegetation good for health, marine vegetation health

benefits, marine vegetation positive effects, . . .

Page 33: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

First Queries, 47 Crowd-Workers

. . . , marine vegetation positive uses, marine vegetation uses,

positive uses of marine vegetation (×8), positive uses of marine

vegetation as source of food, positive uses of marine vegetation

both as a source of food and as a potentially useful drug (×2),

research into health benefits of algae seaweed and kelp, the

positive uses of marine vegetation, the uses of marine vegetation

in food and drugs, uses of algae seaweed and kelp, uses of marine

vegetation (×2), what are some good uses of maritime vegetation,

what are the benefits of eating algae seaweed and kelp, what are

the health benefits of eating algae seaweed and kelp, what are the

health benefits of seaweed, what are the positive uses of marine

vegetation as a food and a medical treatment, what is the benefit

of eating algae seaweed and kelp.

Page 34: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Result Expectations

Information need descriptions – backstories – were writtenfor 180 Q02, R03, and T04 topics (70, 60, 50 resp).

Backstories were categorized as one of [Kelly et al., 2015]:

I Remember, tasks that primarily involve factoid-styleanswers.

I Understand, tasks that involve the construction ofmeaning, for example through interpreting orexemplifying.

I Analyze, tasks that involve breaking material into parts,and making overall decisions based on how these facetsrelate to one another.

Page 35: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Result Expectations

After cleansing, 7,969 responses, averaging 44 per backstory.

Topic workers queries avg. T avg. Q Type

history of42 19 1.77 1.29 R

thailand

marine47 38 3.89 2.94 U

vegetation

(a) There is unpredictable variation in first queries.(b) There is predictable variation in the T and Q estimates.

Page 36: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Distribution of T

Remember Understand Analyze

0.0

0.2

0.4

0 1 2

3−5

6−10

11−

100

101+

0 1 2

3−5

6−10

11−

100

101+

0 1 2

3−5

6−10

11−

100

101+

Estimate of T

Pro

port

ion

of c

ases

Page 37: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Distribution of Q

Remember Understand Analyze

0.0

0.2

0.4

0.6

0.8

1 2

3−5

6−10

11+ 1 2

3−5

6−10

11+ 1 2

3−5

6−10

11+

Estimate of Q

Pro

port

ion

of c

ases

Page 38: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Goal-Sensitive Evaluation

Cooper [1968]:

“Most measures do not take into account a crucial variable:the amount of material relevant . . . which the user actuallyneeds ”.

“A search request is therefore to be conceived in theabstract as involving two parts: a relevance description(normally a subject specification) and a quantityspecification. To put it another way, every search requesthas a definite quantification ”.

Page 39: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

A Model for User Search Behavior

Back to the expected rate at which gain is accrued.

To compute an expectation, need a probability distribution.

Let W(i) be the probability that the user examinessnippet/document i in the ranking while viewing the SERP.

Can we take it as axiomatic that W(i) ≥W(i + 1) ?

A range of evidence says “yes, for the most part”. Andthat’s why we generate ranked lists, after all.

Page 40: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Measuring Fixations

Page 41: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

First Fixations, By Rank

User experiment: 34 users, 6 search topics each.

0.0

0.1

0.2

0.3

0.4

0.5

1 2 3 4 5 6 7 8 9 10First fixation

Pro

port

ion

of c

ases

Page 42: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

All Fixations, By Rank

User experiment: 34 users, 6 search topics each.

0.00

0.05

0.10

0.15

0.20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Rank of fixation

Pro

port

ion

of c

ases

Page 43: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Click-Throughs, By Rank

User experiment: 34 users, 6 search topics each.

0.0

0.1

0.2

0.31 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Rank of click

Pro

port

ion

of c

ases

Page 44: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Fixation Progressions – Zero Order

Observed jump probabilities, expressed as fractions of a totalof 2,633 overlapping two-fixation observations.

< −4 −3 −2 −1 +1 +2 +3 +4 >

0.047 0.033 0.049 0.069 0.230 0.347 0.104 0.046 0.032 0.0430.427 0.573

Median 1.0, mean 0.15.

Page 45: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Fixation Progressions – First Order

+9+8+7+6+5+4+3+2+1−1−2−3−4−5−6−7−8−9

−9

−8

−7

−6

−5

−4

−3

−2

−1

+1

+2

+3

+4

+5

+6

+7

+8

+9

First jump

Nex

t jum

p

0.0 0.1 0.2 0.3 0.4 0.5

Probability

Page 46: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

A Model for User Search Behavior

Let W(i) be the probability that the user examinessnippet/document i in the ranking while viewing the SERP.

Assume that the user is seeking a gain of T in regard to thecurrent information need.

Assume that the user starts at rank 1, and proceeds throughthe SERP until they exit, obtaining gain of ri from rank i .

And let Ti = T −∑i

j=1 ri be the gain still required after idocuments have been viewed.

Page 47: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

A Model for User Search Behavior

Leave SERP

T = T − r

i = 0 view item i,

Info need, i = i+1,

T =??, T = T

Run query,

i−1 i

0

i

Page 48: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

A Model for User Search Behavior

Exit search

view item i,

Info need, i = i+1,Run query,

i = 0

or switchT = T

T = T − r

or reformulate,

Next page,

T =??, T = TLeave SERP

i

i

0

0

ii−1

Page 49: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

A Model for User Search Behavior

The metric value is then the weighted relevance:

M(r) =∞∑i=1

WM(i) · ri ,

where ri is the real-valued (or binary) utility/gain at rank i .

The units for M(r) are “expected gain per documentinspected”.

Page 50: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

A Model for User Search Behavior

An alternative way of thinking about the situation:

CM(i) =WM(i + 1)

WM(i),

the conditional continuation probability at rank i .

Page 51: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

A Model for User Search Behavior

Run query,

i = 0

1 − C(i)

Info need,

view item i,

i = i+1,

T =??, T = T

W(i+1)/W(i)

C(i) =

Leave SERP

T = T − ri

0

ii−1

Page 52: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

User Models – Examples

(1) Precision@k: CPrec(i) = 1 for i < k and 0 otherwise.

(2) Scaled DCG at k, SDCG@k:

CSDCG(i) =log(i + 1)

log(i + 2)

when 1 ≤ i < k , and 0 when i ≥ k . Must be truncated to afixed depth k if WSDCG(i) is to be a probability distribution.

(3) Rank-biased precision: CRBP(i) = p.

Page 53: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Adaptive User Models

But, hang on...

Do users really have the same continuation probability,regardless of what they have already seen?

Wouldn’t it be better to take ri into account whendetermining C(i)?

Page 54: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Adaptive User Models – Examples

(4) Reciprocal rank is defined for ri ∈ {0, 1}:

CRR(i) = (1− ri ) .

(5) Expected reciprocal rank is defined for ri ∈ [0 . . . 1]:

CERR(i) = (1− ri ) .

(6) Average precision is defined by

CAP(i) =

∑∞j=i+1(rj/j)∑∞j=i (rj/j)

.

Page 55: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Sensitive User Models

But, hang on...

Do users really have the same continuation probability,regardless of what they had initially hoped to find?

Wouldn’t it be better to take T or Ti (or both) into accountwhen determining C(i)?

Page 56: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Adaptive and Sensitive User Models

Is there a formulation for C(i) that:

I is adaptive, and reacts to relevance found

I is sensitive, and can be adjusted to user goals

I is computationally tractable, so that upper and lowerbounds can be calculated over prefixes of the ranking

I is complete, and can be computed even when R = 0

I is plausible in terms of the user behavior it models?

Page 57: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Adaptive and Sensitive User Models

Define INST by

CINST(i) =

(i + T + Ti − 1

i + T + Ti

)2

.

Where does this come from? And what does it achieve?

Page 58: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

INST – Adaptive and Sensitive

WINST(i) αkind of

1

(i + T + Ti − 1)2

The weight of each item is inversely proportional to all of:

I the depth in the ranking

I the number of useful items initially sought

I the number of useful items still being sought.

Squared to give a convergent sequence.

Page 59: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

INST – Adaptive and Sensitive

Consequences of the definition:

I If ri = 0, then C(i) > C(i − 1).The more the user has invested in a search, the morelikely they are to continue it.

I If ri = 1, then C(i) = C(i − 1).As users make progress toward their goal, their statusremains constant.

I It is always the case that 0 < C(i) < 1.The user might end the search at any point.

Page 60: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

INST – Adaptive and Sensitive

Expected search length for INST, for ri ≡ 0 and ri ≡ 1.

T Upper Lower

1 2.58 1.333 6.53 3.27

10 20.51 10.2630 60.50 30.25

For any given value of T , all other rankings fall between thetwo limits in terms of expected search depth.

Page 61: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

INST – Adaptive and Sensitive

CINST(i) with ri always 0 and always 1, for T = 3.

1 10 100

rank i

0.60

0.80

1.00

conditio

nal pro

babili

ty C

(i)

T=3, ri=0

T=3, ri=1

Page 62: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

INST – Adaptive and Sensitive

CINST(i) with ri always 0 and always 1, for T = 10.

1 10 100

rank i

0.60

0.80

1.00

conditio

nal pro

babili

ty C

(i)

T=10, ri=0

T=10, ri=1

Page 63: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

INST – Adaptive and Sensitive

WINST(i) with ri always 0 and always 1, for T = 3.

1 10 100

rank i

0.01

0.10

1.00

weig

ht W

(i)

T=3, ri=0

T=3, ri=1

Page 64: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

INST – Adaptive and Sensitive

WINST(i) with ri always 0 and always 1, for T = 10.

1 10 100

rank i

0.01

0.10

1.00

weig

ht W

(i)

T=10, ri=0

T=10, ri=1

Page 65: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Residuals

A benefit of weighted-precision metrics is that residuals canbe computed – the sum of the W(i) values for which ri isunknown.

The residual associated with a score represents the gapbetween an “all unjudged are non-relevant (ri = 0)”assessment, and an “all unjudged are relevant (ri = 1)”assessment.

When residuals are large, the evaluation must be regarded ashaving low credibility.

Page 66: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Query Variation in Action

Topic R03.356, “postmenopausal estrogen Britain”, 46queries, 29 distinct, Indri-SDM similarity, NIST qrels, INST.

0 10 20 30 40 50

Query variant

0.0

0.5

1.0

Sco

re

Residual

Score

Title query

Page 67: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Query Variation in Action

Topic T04.734, “recycling successes”, 44 queries, 25distinct, Indri-SDM similarity, NIST qrels, INST.

0 10 20 30 40 50

Query variant

0.0

0.5

1.0

Sco

re

Page 68: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Query Variation in Action

Total of 110 R03 and T04 topics, showing the fraction ofuser-generated queries that outperform the “title” query.

0.0 0.2 0.4 0.6 0.8 1.0

TREC title-only query (INST)

0.0

0.5

1.0

Be

st

use

r q

ue

ry (

INS

T)

> 75%

> 50%

> 25%

> 0%

<= 0%

Page 69: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Query Variation in Action

Pool growth as users are added (60 R03 topics): documentsper topic to be judged ≈ dn0.7 versus ≈ dn0.5 for systems.

1 10 100

number of users included

10

100

1000

pool siz

e

d=100

d=50

d=20

d=10

Page 70: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Recall-Based Metrics

For some purposes may wish to measure retrievalperformance relative to what a perfect system would attain.

Metrics AP, NDCG, and Q-Measure normalize by R, thenumber of relevant documents for the topic.

By definition, recall-based metrics are divorced from the userexperience. Scores based on partial judgments are not lowerbounds; nor can residuals be calculated.

Page 71: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Summary

There is a relationship between information seeking taskcomplexity, and anticipated values of T and Q.

Making reasonable assumptions, it is possible to construct aweighted-precision user model – and hence evaluationmetric, INST – that is both goal-sensitive and adaptive.

There is vast variation in user queries that is not catered forby current test collections. That means that we are not (yet)able to determine if INST (with per-query T ) genuinelyoffers more refined evaluations than, say, RBP (single pacross topics) or ERR (no parameter at all).

Page 72: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Future Work

Things we would love to do:

I Build a test collection that incorporates query variants:see SIGIR 2016.

I Develop an evaluation using either crowd-workers orlaboratory subjects to distinguish between the searchbehaviors modeled by say RBP and INST.

I Develop an understanding of query variants and theirunderlying “potency”, as step towards an effectivequery rewriting tool.

Page 73: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Contributors

The majority of the work described here has beenundertaken in collaboration with Peter Bailey (Microsoft),Falk Scholer (RMIT University), and Paul Thomas (CSIROand now Microsoft).

Previous collaborators in this area include William Webberand Justin Zobel (University of Melbourne), and ShaneCulpepper (RMIT University).

Xiaolu Lu provided technical assistance with some of thequery processing.

Funded by the Australian Research Council (DP110101934and DP140102655).

Page 74: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Questions (Who? When? Where?)

Page 75: What Would We Like IR Metrics to Measure? - MSE...IR Metrics Metrics Galore Why Do Users Search? Audience Exercise A Model for User Search Behavior Query Variation in Action Recall-Based

IR Metrics

Metrics Galore

Why Do UsersSearch?

Audience Exercise

A Model for UserSearch Behavior

Query Variation inAction

Recall-BasedMetrics

Summary

Sources

This talk was based on a suite of published work:

I Models and Metrics: IR Evaluation as a User Process,ADCS’12

I Users Versus Models: What Observation Tells Us AboutEffectiveness Metrics, CIKM’13

I What Users Do: The Eyes Have It, AIRS’13

I User Variability and IR System Evaluation, SIGIR’15

I Pooled Evaluation Over Query Variations: Users are asDiverse as Systems, CIKM’15

I INST: An Adaptive Metric for Information RetrievalEvaluation, ADCS’15

I UQV: A Test Collection with Query Variability, SIGIR’16


Recommended