+ All Categories
Home > Documents > Analysis of Precision and Recall

Analysis of Precision and Recall

Date post: 14-Apr-2018
Category:
Upload: vinutha-raghavendra
View: 217 times
Download: 0 times
Share this document with a friend

of 15

Transcript
  • 7/30/2019 Analysis of Precision and Recall

    1/15

    1

    A Statistical Analysis of thePrecision-Recall Graph

    Ralf Herbrich

    Microsoft Research

    UK

    Joint work with Hugo Zaragoza and Simon Hill

  • 7/30/2019 Analysis of Precision and Recall

    2/15

    2

    Overview

    The Precision-Recall Graph

    A Stability Analysis

    Main Result Discussion and Applications

    Conclusions

  • 7/30/2019 Analysis of Precision and Recall

    3/15

    3

    Features of Ranking Learning

    We cannot take differences of ranks.

    We cannot ignore the order of ranks.

    Point-wise loss functions do not capture theranking performance!

    ROC or precision-recall curves do capture

    the ranking performance.

    We need generalisation error bounds for

    ROC and precision-recall curves!

  • 7/30/2019 Analysis of Precision and Recall

    4/15

    4

    Precision and Recall

    Given: Sample z=((x1,y1),...,(xm,ym)) 2 (X {0,1})

    m withk positive yi together with a function f:X ! R.

    Ranking the sample: Re-order the sample: f(x(1)) f(x(m))

    Record the indices i1,, ik of the positive y(j).

    Precision pi and ri recall:

  • 7/30/2019 Analysis of Precision and Recall

    5/15

    5

    Precision-Recall: An Example

    After reordering:

    f(x(i))

  • 7/30/2019 Analysis of Precision and Recall

    6/15

    6

    Break-Even Point

    0 0.2 0.4 0.6 0.8 10

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Recall

    Precision

    Break-Even point

  • 7/30/2019 Analysis of Precision and Recall

    7/15

    7

    Average Precision

    0 0.2 0.4 0.6 0.8 10

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Recall

    Precision

  • 7/30/2019 Analysis of Precision and Recall

    8/15

  • 7/30/2019 Analysis of Precision and Recall

    9/15

    9

    Stability Analysis

    Case 1: yi=0

    Case 2: yi=1

  • 7/30/2019 Analysis of Precision and Recall

    10/15

    10

    Proof

    Case 1: yi=0

    Case 2: yi=1

  • 7/30/2019 Analysis of Precision and Recall

    11/15

    11

    Main Result

    Theorem: For all probability measures, for all>1/m, for allf:X ! R, with probability at least1- over the IID draw of a training and test

    sample both of size m, if both training samplez and test sample z contain at least dmepositive examples then

  • 7/30/2019 Analysis of Precision and Recall

    12/15

    12

    Proof

    1. McDiarmids inequality: For any function

    g:Zn ! R with stability c, for all probability

    measures P with probability at least 1-

    over the IID draw ofZ

    2. Set n= 2m and call the two m-halfes Z1 andZ2. Define gi (Z):=A(f,Zi). Then, by IID

  • 7/30/2019 Analysis of Precision and Recall

    13/15

    13

    Discussions

    First bound which shows that asymptotically

    (m!1) training and test set performance (in

    terms of average precision) converge!

    The effective sample size is onlythe number

    of positive examples, in fact, only 2m .

    The proof can be generalised to arbitrary test

    sample sizes.

    The constants can be improved.

  • 7/30/2019 Analysis of Precision and Recall

    14/15

    14

    Applications

    Cardinality bounds

    Compression Bounds

    (TREC 2002)

    No VC bounds!

    No Margin bounds!

    Union bound:

  • 7/30/2019 Analysis of Precision and Recall

    15/15

    15

    Conclusions

    Ranking learning requires to consider non-

    point-wise loss functions.

    In order to study the complexity of algorithms

    we need to have large deviation inequalities

    for ranking performance measures.

    McDiarmids inequality is a powerful tool.

    Future work is focused on ROC curves.


Recommended