Date post: | 02-Jun-2018 |
Category: |
Documents |
Upload: | sinta-damayanti |
View: | 224 times |
Download: | 0 times |
of 25
8/10/2019 IR Evaluation tugas kampus
1/25
Introduction to Information Retrieval
Introduction to
Information RetrievalCS276
Information Retrieval and Web Search
Pandu Nayak and Prabhakar Raghavan
Lecture 8: Evaluation
8/10/2019 IR Evaluation tugas kampus
2/25
Introduction to Information Retrieval
2
Measures for a search engine
How fast does it index
Number of documents/hour
(Average document size)
How fast does it search Latency as a function of index size
Expressiveness of query language
Ability to express complex information needs Speed on complex queries
Uncluttered UI
Is it free?
Sec. 8.6
8/10/2019 IR Evaluation tugas kampus
3/25
Introduction to Information Retrieval
3
Measures for a search engine
All of the preceding criteria are measurable: we can
quantify speed/size
we can make expressiveness precise
The key measure: user happiness What is this?
Speed of response/size of index are factors
But blindingly fast, useless answers wont make a user
happy
Need a way of quantifying user happiness
Sec. 8.6
8/10/2019 IR Evaluation tugas kampus
4/25
Introduction to Information Retrieval
4
Happiness: elusive to measure
Most common proxy: relevanceof search results
But how do you measure relevance?
We will detail a methodology here, then examineits issues
Relevance measurement requires 3 elements:
1. A benchmark document collection
2. A benchmark suite of queries
3. A usually binary assessment of either Relevant or
Nonrelevant for each query and each document
Some work on more-than-binary, but not the standard
Sec. 8.1
8/10/2019 IR Evaluation tugas kampus
5/25
Introduction to Information Retrieval
5
Evaluating an IR system
Note: the information needis translated into a
query
Relevance is assessed relative to the information
neednot thequery E.g., Information need: I'm looking for information on
whether drinking red wine is more effective at
reducing your risk of heart attacks than white wine.
Query: wine red white heart attack effective
Evaluate whether the doc addresses the information
need, not whether it has these words
Sec. 8.1
8/10/2019 IR Evaluation tugas kampus
6/25
Introduction to Information Retrieval
6
Standard relevance benchmarks
TREC - National Institute of Standards and
Technology (NIST) has run a large IR test bed for
many years
Reuters and other benchmark doc collections used Retrieval tasks specified
sometimes as queries
Human experts mark, for each query and for eachdoc, Relevant or Nonrelevant
or at least for subset of docs that some system returned
for that query
Sec. 8.2
8/10/2019 IR Evaluation tugas kampus
7/25
Introduction to Information Retrieval
7
Unranked retrieval evaluation:
Precision and Recall
Precision: fraction of retrieved docs that are relevant
= P(relevant|retrieved)
Recall: fraction of relevant docs that are retrieved
= P(retrieved|relevant)
Precision P = tp/(tp + fp)
Recall R = tp/(tp + fn)
Relevant Nonrelevant
Retrieved tp fp
Not Retrieved fn tn
Sec. 8.3
8/10/2019 IR Evaluation tugas kampus
8/25
Introduction to Information Retrieval
8
Unranked retrieval evaluation:
Precision and Recall
Precision
The ability to retrievetop-ranked documents that are
mostly relevant.
Recall The ability of the search to find allof the relevant items in
the corpus.
Sec. 8.3
8/10/2019 IR Evaluation tugas kampus
9/25
Introduction to Information Retrieval
9
Precision/Recall
You can get high recall (but low precision) by
retrieving all docs for all queries!
Recall is a non-decreasing function of the number
of docs retrieved
In a good system, precision decreases as either the
number of docs retrieved or recall increases This is not a theorem, but a result with strong empirical
confirmation
Sec. 8.3
8/10/2019 IR Evaluation tugas kampus
10/25
Introduction to Information Retrieval
10
Trade-off between Recall and Precision
10
1
Recall
Precision
The ideal
Returns relevant documents but
misses many useful ones too
Returns most relevantdocuments but includes
lots of junk
d f l
8/10/2019 IR Evaluation tugas kampus
11/25
Introduction to Information Retrieval
11
F-Measure
One measure of performance that takes into account
both recall and precision.
Harmonic mean of recall and precision:
Compared to arithmetic mean, both need to be highfor harmonic mean to be high.
PRRP
PRF
11
22
I d i I f i R i l
8/10/2019 IR Evaluation tugas kampus
12/25
Introduction to Information Retrieval
12
E Measure (parameterized F Measure)
A variant of F measure that allows weightingemphasis on precision over recall:
Value of controls trade-off:
= 1: Equally weight precision and recall (E=F).
> 1: Weight recall more.
< 1: Weight precision more.
PRRP
PR
E 1
2
2
2
2
)1()1(
I d i I f i R i l S 8 3
8/10/2019 IR Evaluation tugas kampus
13/25
Introduction to Information Retrieval
13
Ranked retrieval evaluation:
Precision and Recall
Precision: fraction of retrieved docs that are relevant
= P(relevant|retrieved)
Recall: fraction of relevant docs that are retrieved
= P(retrieved|relevant)
Precision P = tp/(tp + fp)
Recall R = tp/(tp + fn)
Relevant Nonrelevant
Retrieved tp fp
Not Retrieved fn tn
Sec. 8.3
I t d ti t I f ti R t i l
8/10/2019 IR Evaluation tugas kampus
14/25
Introduction to Information Retrieval
14
Computing Recall/Precision Points
For a given query, produce the ranked list of
retrievals.
Adjusting a threshold on this ranked list produces
different sets of retrieved documents, and thereforedifferent recall/precision measures.
Mark each document in the ranked list that is
relevant according to the gold standard.
Compute a recall/precision pair for each position in
the ranked list that contains a relevant document.
I t d ti t I f ti R t i l
8/10/2019 IR Evaluation tugas kampus
15/25
Introduction to Information Retrieval
Example 1
15
n doc # relevant
1 588 x
2 589 x
3 576
4 590 x
5 986
6 592 x
7 984
8 988
9 578
10 985
11 103
12 591
13 772 x
14 990
R=3/6=0.5; P=3/4=0.75
Let total # of relevant docs = 6
Check each new recall point:
R=1/6=0.167; P=1/1=1
R=2/6=0.333; P=2/2=1
R=5/6=0.833; p=5/13=0.38
R=4/6=0.667; P=4/6=0.667
Missing onerelevant
document.
Never reach
100% recall
Introduction to Information Retrieval
8/10/2019 IR Evaluation tugas kampus
16/25
Introduction to Information Retrieval
Example 2
16
R=3/6=0.5; P=3/5=0.6
Let total # of relevant docs = 6
Check each new recall point:
R=1/6=0.167; P=1/1=1
R=2/6=0.333; P=2/3=0.667
R=6/6=1.0; p=6/14=0.429
R=4/6=0.667; P=4/8=0.5
R=5/6=0.833; P=5/9=0.556
n doc # relevant
1 588 x
2 576
3 589 x
4 342
5 590 x
6 717
7 984
8 772 x
9 321 x
10 498
11 113
12 628
13 772
14 592 x
Introduction to Information Retrieval
8/10/2019 IR Evaluation tugas kampus
17/25
Introduction to Information Retrieval
Interpolating a Recall/Precision Curve
Interpolate a precision value for each standard recall
level:
rj{0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}
r0= 0.0, r1= 0.1, , r10=1.0
The interpolated precision at thej-th standard recall
level is the maximum known precision at any recall
level between thej-th and (j + 1)-th level:
17
)(max)(1
rPrPjj rrr
j
Introduction to Information Retrieval
8/10/2019 IR Evaluation tugas kampus
18/25
Introduction to Information Retrieval
Example 1
18
0.4 0.8
1.0
0.8
0.6
0.4
0.2
0.2 1.00.6 Recall
Precision
Introduction to Information Retrieval
8/10/2019 IR Evaluation tugas kampus
19/25
Introduction to Information Retrieval
Example 2
19
0.4 0.8
1.0
0.8
0.6
0.4
0.2
0.2 1.00.6 Recall
Precision
8/10/2019 IR Evaluation tugas kampus
20/25
Introduction to Information Retrieval
8/10/2019 IR Evaluation tugas kampus
21/25
Introduction to Information Retrieval
21
Compare Two or More Systems
The curve closest to the upper right-hand corner ofthe graph indicates the best performance
0
0.2
0.4
0.6
0.8
1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Precision
NoStem Stem
Introduction to Information Retrieval
8/10/2019 IR Evaluation tugas kampus
22/25
Introduction to Information Retrieval
22
Sample RP Curve for CF Corpus
Introduction to Information Retrieval
8/10/2019 IR Evaluation tugas kampus
23/25
Introduction to Information Retrieval
23
R- Precision
Precision at the R-th position in the ranking of results
for a query that has R relevant documents.
n doc # relevant
1 588 x
2 589 x
3 576
4 590 x
5 986
6 592 x
7 984
8 988
9 578
10 985
11 103
12 591
13 772 x
14 990
R = # of relevant docs = 6
R-Precision = 4/6 = 0.67
Introduction to Information Retrieval
8/10/2019 IR Evaluation tugas kampus
24/25
Introduction to Information Retrieval
Mean Average Precision (MAP)
Average Precision: Average of the precision values at the
points at which each relevant document is retrieved.
Ex1: (1 + 1 + 0.75 + 0.667 + 0.38 + 0)/6 = 0.633
Ex2: (1 + 0.667 + 0.6 + 0.5 + 0.556 + 0.429)/6 = 0.625
Mean Average Precision: Average of the average
precision value for a set of queries.
24
Introduction to Information Retrieval
8/10/2019 IR Evaluation tugas kampus
25/25
Introduction to Information Retrieval
25