Date post: | 14-Apr-2018 |
Category: |
Documents |
Upload: | vinutha-raghavendra |
View: | 217 times |
Download: | 0 times |
of 15
7/30/2019 Analysis of Precision and Recall
1/15
1
A Statistical Analysis of thePrecision-Recall Graph
Ralf Herbrich
Microsoft Research
UK
Joint work with Hugo Zaragoza and Simon Hill
7/30/2019 Analysis of Precision and Recall
2/15
2
Overview
The Precision-Recall Graph
A Stability Analysis
Main Result Discussion and Applications
Conclusions
7/30/2019 Analysis of Precision and Recall
3/15
3
Features of Ranking Learning
We cannot take differences of ranks.
We cannot ignore the order of ranks.
Point-wise loss functions do not capture theranking performance!
ROC or precision-recall curves do capture
the ranking performance.
We need generalisation error bounds for
ROC and precision-recall curves!
7/30/2019 Analysis of Precision and Recall
4/15
4
Precision and Recall
Given: Sample z=((x1,y1),...,(xm,ym)) 2 (X {0,1})
m withk positive yi together with a function f:X ! R.
Ranking the sample: Re-order the sample: f(x(1)) f(x(m))
Record the indices i1,, ik of the positive y(j).
Precision pi and ri recall:
7/30/2019 Analysis of Precision and Recall
5/15
5
Precision-Recall: An Example
After reordering:
f(x(i))
7/30/2019 Analysis of Precision and Recall
6/15
6
Break-Even Point
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall
Precision
Break-Even point
7/30/2019 Analysis of Precision and Recall
7/15
7
Average Precision
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall
Precision
7/30/2019 Analysis of Precision and Recall
8/15
7/30/2019 Analysis of Precision and Recall
9/15
9
Stability Analysis
Case 1: yi=0
Case 2: yi=1
7/30/2019 Analysis of Precision and Recall
10/15
10
Proof
Case 1: yi=0
Case 2: yi=1
7/30/2019 Analysis of Precision and Recall
11/15
11
Main Result
Theorem: For all probability measures, for all>1/m, for allf:X ! R, with probability at least1- over the IID draw of a training and test
sample both of size m, if both training samplez and test sample z contain at least dmepositive examples then
7/30/2019 Analysis of Precision and Recall
12/15
12
Proof
1. McDiarmids inequality: For any function
g:Zn ! R with stability c, for all probability
measures P with probability at least 1-
over the IID draw ofZ
2. Set n= 2m and call the two m-halfes Z1 andZ2. Define gi (Z):=A(f,Zi). Then, by IID
7/30/2019 Analysis of Precision and Recall
13/15
13
Discussions
First bound which shows that asymptotically
(m!1) training and test set performance (in
terms of average precision) converge!
The effective sample size is onlythe number
of positive examples, in fact, only 2m .
The proof can be generalised to arbitrary test
sample sizes.
The constants can be improved.
7/30/2019 Analysis of Precision and Recall
14/15
14
Applications
Cardinality bounds
Compression Bounds
(TREC 2002)
No VC bounds!
No Margin bounds!
Union bound:
7/30/2019 Analysis of Precision and Recall
15/15
15
Conclusions
Ranking learning requires to consider non-
point-wise loss functions.
In order to study the complexity of algorithms
we need to have large deviation inequalities
for ranking performance measures.
McDiarmids inequality is a powerful tool.
Future work is focused on ROC curves.