+ All Categories
Home > Documents > Introduction to Information Retrieval ` `%%%`# `...

Introduction to Information Retrieval ` `%%%`# `...

Date post: 13-Sep-2018
Category:
Upload: phungkhanh
View: 219 times
Download: 0 times
Share this document with a friend
118
Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction to Information Retrieval http://informationretrieval.org IIR 11: Probabilistic Information Retrieval Hinrich Sch¨ utze Institute for Natural Language Processing, Universit¨ at Stuttgart 2011-08-29 Sch¨ utze: Probabilistic Information Retrieval 1 / 36
Transcript
Page 1: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Introduction to Information Retrievalhttp://informationretrieval.org

IIR 11: Probabilistic Information Retrieval

Hinrich Schutze

Institute for Natural Language Processing, Universitat Stuttgart

2011-08-29

Schutze: Probabilistic Information Retrieval 1 / 36

Page 2: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Models and Methods

1 Boolean model and its limitations (30)

2 Vector space model (30)

3 Probabilistic models (30)

4 Language model-based retrieval (30)

5 Latent semantic indexing (30)

6 Learning to rank (30)

Schutze: Probabilistic Information Retrieval 3 / 36

Page 3: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Take-away

Schutze: Probabilistic Information Retrieval 4 / 36

Page 4: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Take-away

Probabilistic approach to IR: Introduction

Schutze: Probabilistic Information Retrieval 4 / 36

Page 5: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Take-away

Probabilistic approach to IR: Introduction

Binary independence model or BIM – the first influentialprobabilistic model

Schutze: Probabilistic Information Retrieval 4 / 36

Page 6: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Take-away

Probabilistic approach to IR: Introduction

Binary independence model or BIM – the first influentialprobabilistic model

Okapi BM25, a more modern, better performing probabilisticmodel

Schutze: Probabilistic Information Retrieval 4 / 36

Page 7: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Outline

1 Probabilistic Approach to IR

2 Binary independence model

3 Okapi BM25

Schutze: Probabilistic Information Retrieval 5 / 36

Page 8: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic approach to IR

The adhoc retrieval problem: Given a user information needand a collection of documents, the IR system must determinehow well the documents satisfy the query.

Schutze: Probabilistic Information Retrieval 6 / 36

Page 9: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic approach to IR

The adhoc retrieval problem: Given a user information needand a collection of documents, the IR system must determinehow well the documents satisfy the query.

The IR system has an uncertain understanding of the userquery . . .

Schutze: Probabilistic Information Retrieval 6 / 36

Page 10: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic approach to IR

The adhoc retrieval problem: Given a user information needand a collection of documents, the IR system must determinehow well the documents satisfy the query.

The IR system has an uncertain understanding of the userquery . . .

. . . and makes an uncertain guess of whether a documentsatisfies the query.

Schutze: Probabilistic Information Retrieval 6 / 36

Page 11: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic approach to IR

The adhoc retrieval problem: Given a user information needand a collection of documents, the IR system must determinehow well the documents satisfy the query.

The IR system has an uncertain understanding of the userquery . . .

. . . and makes an uncertain guess of whether a documentsatisfies the query.

Probability theory provides a principled foundation for suchreasoning under uncertainty.

Schutze: Probabilistic Information Retrieval 6 / 36

Page 12: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic approach to IR

The adhoc retrieval problem: Given a user information needand a collection of documents, the IR system must determinehow well the documents satisfy the query.

The IR system has an uncertain understanding of the userquery . . .

. . . and makes an uncertain guess of whether a documentsatisfies the query.

Probability theory provides a principled foundation for suchreasoning under uncertainty.

Probabilistic IR models exploit this foundation to estimatehow likely it is that a document is relevant to a query.

Schutze: Probabilistic Information Retrieval 6 / 36

Page 13: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic vs. vector space model

Schutze: Probabilistic Information Retrieval 7 / 36

Page 14: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic vs. vector space model

Vector space model: rank documents according to similarityto query.

Schutze: Probabilistic Information Retrieval 7 / 36

Page 15: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic vs. vector space model

Vector space model: rank documents according to similarityto query.

The notion of similarity does not translate directly into anassessment of “is the document a good document to give tothe user or not?”

Schutze: Probabilistic Information Retrieval 7 / 36

Page 16: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic vs. vector space model

Vector space model: rank documents according to similarityto query.

The notion of similarity does not translate directly into anassessment of “is the document a good document to give tothe user or not?”

The most similar document can be highly relevant orcompletely nonrelevant.

Schutze: Probabilistic Information Retrieval 7 / 36

Page 17: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic vs. vector space model

Vector space model: rank documents according to similarityto query.

The notion of similarity does not translate directly into anassessment of “is the document a good document to give tothe user or not?”

The most similar document can be highly relevant orcompletely nonrelevant.

Probability theory is arguably a cleaner formalization of whatwe really want an IR system to do: give relevant documentsto the user.

Schutze: Probabilistic Information Retrieval 7 / 36

Page 18: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic IR models at a glance

Schutze: Probabilistic Information Retrieval 8 / 36

Page 19: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic IR models at a glance

Classical probabilistic retrieval models

Schutze: Probabilistic Information Retrieval 8 / 36

Page 20: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic IR models at a glance

Classical probabilistic retrieval models

Binary Independence Model

Schutze: Probabilistic Information Retrieval 8 / 36

Page 21: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic IR models at a glance

Classical probabilistic retrieval models

Binary Independence ModelOkapi BM25

Schutze: Probabilistic Information Retrieval 8 / 36

Page 22: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic IR models at a glance

Classical probabilistic retrieval models

Binary Independence ModelOkapi BM25

Bayesian networks for text retrieval

Schutze: Probabilistic Information Retrieval 8 / 36

Page 23: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic IR models at a glance

Classical probabilistic retrieval models

Binary Independence ModelOkapi BM25

Bayesian networks for text retrieval

Don’t have time for this

Schutze: Probabilistic Information Retrieval 8 / 36

Page 24: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic IR models at a glance

Classical probabilistic retrieval models

Binary Independence ModelOkapi BM25

Bayesian networks for text retrieval

Don’t have time for this

Language model approach to IR

Schutze: Probabilistic Information Retrieval 8 / 36

Page 25: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic IR models at a glance

Classical probabilistic retrieval models

Binary Independence ModelOkapi BM25

Bayesian networks for text retrieval

Don’t have time for this

Language model approach to IR

Important recent work, will be covered in the next lecture

Schutze: Probabilistic Information Retrieval 8 / 36

Page 26: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic IR and ranking

Ranked retrieval setup: the user issues a query, and a rankedlist of documents is returned.

Schutze: Probabilistic Information Retrieval 9 / 36

Page 27: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic IR and ranking

Ranked retrieval setup: the user issues a query, and a rankedlist of documents is returned.

How can we rank probabilistically?

Schutze: Probabilistic Information Retrieval 9 / 36

Page 28: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic IR and ranking

Ranked retrieval setup: the user issues a query, and a rankedlist of documents is returned.

How can we rank probabilistically?

Let Rd,q be a random dichotomous variable, such that

Schutze: Probabilistic Information Retrieval 9 / 36

Page 29: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic IR and ranking

Ranked retrieval setup: the user issues a query, and a rankedlist of documents is returned.

How can we rank probabilistically?

Let Rd,q be a random dichotomous variable, such that

Rd,q = 1 if document d is relevant w.r.t query q

Schutze: Probabilistic Information Retrieval 9 / 36

Page 30: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic IR and ranking

Ranked retrieval setup: the user issues a query, and a rankedlist of documents is returned.

How can we rank probabilistically?

Let Rd,q be a random dichotomous variable, such that

Rd,q = 1 if document d is relevant w.r.t query q

Rd,q = 0 otherwise

Schutze: Probabilistic Information Retrieval 9 / 36

Page 31: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic IR and ranking

Ranked retrieval setup: the user issues a query, and a rankedlist of documents is returned.

How can we rank probabilistically?

Let Rd,q be a random dichotomous variable, such that

Rd,q = 1 if document d is relevant w.r.t query q

Rd,q = 0 otherwise

(This is a binary notion of relevance.)

Schutze: Probabilistic Information Retrieval 9 / 36

Page 32: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic IR and ranking

Ranked retrieval setup: the user issues a query, and a rankedlist of documents is returned.

How can we rank probabilistically?

Let Rd,q be a random dichotomous variable, such that

Rd,q = 1 if document d is relevant w.r.t query q

Rd,q = 0 otherwise

(This is a binary notion of relevance.)

Probabilistic ranking orders documents decreasingly by theirestimated probability of relevance w.r.t. query: P(R = 1|d , q)

Schutze: Probabilistic Information Retrieval 9 / 36

Page 33: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probabilistic IR and ranking

Ranked retrieval setup: the user issues a query, and a rankedlist of documents is returned.

How can we rank probabilistically?

Let Rd,q be a random dichotomous variable, such that

Rd,q = 1 if document d is relevant w.r.t query q

Rd,q = 0 otherwise

(This is a binary notion of relevance.)

Probabilistic ranking orders documents decreasingly by theirestimated probability of relevance w.r.t. query: P(R = 1|d , q)

How can we justify this way of proceeding?

Schutze: Probabilistic Information Retrieval 9 / 36

Page 34: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probability Ranking Principle (PRP)

If the retrieved documents are ranked decreasingly on theirprobability of relevance (w.r.t a query), then the effectiveness ofthe system will be the best that is obtainable.

Schutze: Probabilistic Information Retrieval 10 / 36

Page 35: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probability Ranking Principle (PRP)

If the retrieved documents are ranked decreasingly on theirprobability of relevance (w.r.t a query), then the effectiveness ofthe system will be the best that is obtainable.

Fundamental assumption: the relevance of each document isindependent of the relevance of other documents.

Schutze: Probabilistic Information Retrieval 10 / 36

Page 36: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Outline

1 Probabilistic Approach to IR

2 Binary independence model

3 Okapi BM25

Schutze: Probabilistic Information Retrieval 11 / 36

Page 37: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Binary Independence Model (BIM)

Binary: documents and queries represented as binary termincidence vectors

Schutze: Probabilistic Information Retrieval 12 / 36

Page 38: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Binary Independence Model (BIM)

Binary: documents and queries represented as binary termincidence vectors

Independence: terms are independent of each other (not true,but works in practice – naive assumption of Naive Bayesmodels)

Schutze: Probabilistic Information Retrieval 12 / 36

Page 39: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Binary incidence matrix

Anthony Julius The Hamlet Othello Macbeth . . .and Caesar Tempest

CleopatraAnthony 1 1 0 0 0 1Brutus 1 1 0 1 0 0Caesar 1 1 0 1 1 1Calpurnia 0 1 0 0 0 0Cleopatra 1 0 0 0 0 0mercy 1 0 1 1 1 1worser 1 0 1 1 1 0. . .

Each document is represented as a binary vector ∈ {0, 1}|V |.

Schutze: Probabilistic Information Retrieval 13 / 36

Page 40: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Bayes’ rule

Schutze: Probabilistic Information Retrieval 14 / 36

Page 41: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Bayes’ rule

P(R = 1|~x , ~q) =P(~x |R = 1, ~q)P(R = 1|~q)

P(~x |~q)

P(R = 0|~x , ~q) =P(~x |R = 0, ~q)P(R = 0|~q)

P(~x |~q)

Schutze: Probabilistic Information Retrieval 14 / 36

Page 42: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Bayes’ rule

P(R = 1|~x , ~q) =P(~x |R = 1, ~q)P(R = 1|~q)

P(~x |~q)

P(R = 0|~x , ~q) =P(~x |R = 0, ~q)P(R = 0|~q)

P(~x |~q)

(Recall that document and query are modeled as termincidence vectors: ~x and ~q.)

Schutze: Probabilistic Information Retrieval 14 / 36

Page 43: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Bayes’ rule

P(R = 1|~x , ~q) =P(~x |R = 1, ~q)P(R = 1|~q)

P(~x |~q)

P(R = 0|~x , ~q) =P(~x |R = 0, ~q)P(R = 0|~q)

P(~x |~q)

(Recall that document and query are modeled as termincidence vectors: ~x and ~q.)

P(~x |R = 1, ~q) and P(~x |R = 0, ~q): probability that if arelevant or nonrelevant document is retrieved, then thatdocument’s representation is ~x

Schutze: Probabilistic Information Retrieval 14 / 36

Page 44: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Bayes’ rule

P(R = 1|~x , ~q) =P(~x |R = 1, ~q)P(R = 1|~q)

P(~x |~q)

P(R = 0|~x , ~q) =P(~x |R = 0, ~q)P(R = 0|~q)

P(~x |~q)

(Recall that document and query are modeled as termincidence vectors: ~x and ~q.)

P(~x |R = 1, ~q) and P(~x |R = 0, ~q): probability that if arelevant or nonrelevant document is retrieved, then thatdocument’s representation is ~x

Use statistics about the document collection to estimate theseprobabilities

Schutze: Probabilistic Information Retrieval 14 / 36

Page 45: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Priors

P(R |d , q) is modeled using term incidence vectors as P(R |~x , ~q)

P(R = 1|~x , ~q) =P(~x |R = 1, ~q)P(R = 1|~q)

P(~x |~q)

P(R = 0|~x , ~q) =P(~x |R = 0, ~q)P(R = 0|~q)

P(~x |~q)

Schutze: Probabilistic Information Retrieval 15 / 36

Page 46: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Priors

P(R |d , q) is modeled using term incidence vectors as P(R |~x , ~q)

P(R = 1|~x , ~q) =P(~x |R = 1, ~q)P(R = 1|~q)

P(~x |~q)

P(R = 0|~x , ~q) =P(~x |R = 0, ~q)P(R = 0|~q)

P(~x |~q)

P(R = 1|~q) and P(R = 0|~q): prior probability of retrieving arelevant or nonrelevant document for a query ~q

Schutze: Probabilistic Information Retrieval 15 / 36

Page 47: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Priors

P(R |d , q) is modeled using term incidence vectors as P(R |~x , ~q)

P(R = 1|~x , ~q) =P(~x |R = 1, ~q)P(R = 1|~q)

P(~x |~q)

P(R = 0|~x , ~q) =P(~x |R = 0, ~q)P(R = 0|~q)

P(~x |~q)

P(R = 1|~q) and P(R = 0|~q): prior probability of retrieving arelevant or nonrelevant document for a query ~q

Estimate P(R = 1|~q) and P(R = 0|~q) from percentage ofrelevant documents in the collection

Schutze: Probabilistic Information Retrieval 15 / 36

Page 48: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Ranking according to odds

We said that we’re going to rank documents according toP(R = 1|~x , ~q)

Schutze: Probabilistic Information Retrieval 16 / 36

Page 49: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Ranking according to odds

We said that we’re going to rank documents according toP(R = 1|~x , ~q)

Easier: rank documents by their odds of relevance (gives sameranking)

O(R |~x, ~q) =P(R = 1|~x , ~q)

P(R = 0|~x , ~q)=

P(R=1|~q)P(~x |R=1,~q)P(~x |~q)

P(R=0|~q)P(~x |R=0,~q)P(~x |~q)

=P(R = 1|~q)

P(R = 0|~q)·P(~x |R = 1, ~q)

P(~x |R = 0, ~q)

Schutze: Probabilistic Information Retrieval 16 / 36

Page 50: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Ranking according to odds

We said that we’re going to rank documents according toP(R = 1|~x , ~q)

Easier: rank documents by their odds of relevance (gives sameranking)

O(R |~x, ~q) =P(R = 1|~x , ~q)

P(R = 0|~x , ~q)=

P(R=1|~q)P(~x |R=1,~q)P(~x |~q)

P(R=0|~q)P(~x |R=0,~q)P(~x |~q)

=P(R = 1|~q)

P(R = 0|~q)·P(~x |R = 1, ~q)

P(~x |R = 0, ~q)

P(R=1|~q)P(R=0|~q) is a constant for a given query - can be ignored

Schutze: Probabilistic Information Retrieval 16 / 36

Page 51: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Naive Bayes conditional independence assumption

Schutze: Probabilistic Information Retrieval 17 / 36

Page 52: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Naive Bayes conditional independence assumption

Now we make the Naive Bayes conditional independenceassumption that the presence or absence of a word in a documentis independent of the presence or absence of any other word (giventhe query):

P(~x |R = 1, ~q)

P(~x |R = 0, ~q)=

∏Mt=1 P(xt |R = 1, ~q)

∏Mt=1 P(xt |R = 0, ~q)

So:

O(R |~x , ~q) ∝M∏

t=1

P(xt |R = 1, ~q)

P(xt |R = 0, ~q)

Schutze: Probabilistic Information Retrieval 17 / 36

Page 53: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Separating terms in the document vs. not

Since each xt is either 0 or 1, we can separate the terms:

Schutze: Probabilistic Information Retrieval 18 / 36

Page 54: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Separating terms in the document vs. not

Since each xt is either 0 or 1, we can separate the terms:

O(R |~x , ~q) ∝∏

t:xt=1

P(xt = 1|R = 1, ~q)

P(xt = 1|R = 0, ~q)

t:xt=0

P(xt = 0|R = 1, ~q)

P(xt = 0|R = 0, ~q)

Schutze: Probabilistic Information Retrieval 18 / 36

Page 55: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Definition of pt and ut

Let pt = P(xt = 1|R = 1, ~q) be the probability of a termappearing in relevant document.

Schutze: Probabilistic Information Retrieval 19 / 36

Page 56: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Definition of pt and ut

Let pt = P(xt = 1|R = 1, ~q) be the probability of a termappearing in relevant document.

Let ut = P(xt = 1|R = 0, ~q) be the probability of a termappearing in a nonrelevant document.

Schutze: Probabilistic Information Retrieval 19 / 36

Page 57: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Definition of pt and ut

Let pt = P(xt = 1|R = 1, ~q) be the probability of a termappearing in relevant document.

Let ut = P(xt = 1|R = 0, ~q) be the probability of a termappearing in a nonrelevant document.

Can be displayed as contingency table:

R = 1 R = 0

term present xt = 1 pt ut

term absent xt = 0 1 − pt 1 − ut

Schutze: Probabilistic Information Retrieval 19 / 36

Page 58: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Definition of pt and ut

Let pt = P(xt = 1|R = 1, ~q) be the probability of a termappearing in relevant document.

Let ut = P(xt = 1|R = 0, ~q) be the probability of a termappearing in a nonrelevant document.

Can be displayed as contingency table:

R = 1 R = 0

term present xt = 1 pt ut

term absent xt = 0 1 − pt 1 − ut

O(R |~x , ~q) ∝∏

t:xt=1

pt

ut

t:xt=0

1 − pt

1 − ut

Schutze: Probabilistic Information Retrieval 19 / 36

Page 59: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Dropping terms that don’t occur in the query

Schutze: Probabilistic Information Retrieval 20 / 36

Page 60: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Dropping terms that don’t occur in the query

Additional simplifying assumption: If qt = 0, then pt = ut

Schutze: Probabilistic Information Retrieval 20 / 36

Page 61: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Dropping terms that don’t occur in the query

Additional simplifying assumption: If qt = 0, then pt = ut

A term not occurring in the query is equally likely to occur inrelevant and nonrelevant documents.

Schutze: Probabilistic Information Retrieval 20 / 36

Page 62: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Dropping terms that don’t occur in the query

Additional simplifying assumption: If qt = 0, then pt = ut

A term not occurring in the query is equally likely to occur inrelevant and nonrelevant documents.

Now we need only to consider terms in the products thatappear in the query:

Schutze: Probabilistic Information Retrieval 20 / 36

Page 63: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Dropping terms that don’t occur in the query

Additional simplifying assumption: If qt = 0, then pt = ut

A term not occurring in the query is equally likely to occur inrelevant and nonrelevant documents.

Now we need only to consider terms in the products thatappear in the query:

Schutze: Probabilistic Information Retrieval 20 / 36

Page 64: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Dropping terms that don’t occur in the query

Additional simplifying assumption: If qt = 0, then pt = ut

A term not occurring in the query is equally likely to occur inrelevant and nonrelevant documents.

Now we need only to consider terms in the products thatappear in the query:

O(R |~x , ~q) ∝∏

t:xt=1

pt

ut

t:xt=0

1 − pt

1 − ut

≈∏

t:xt=qt=1

pt

ut

t:xt=0,qt=1

1 − pt

1 − ut

Schutze: Probabilistic Information Retrieval 20 / 36

Page 65: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

BIM retrieval status value

Schutze: Probabilistic Information Retrieval 21 / 36

Page 66: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

BIM retrieval status value

Including the query terms found in the document into theright product, but simultaneously dividing by them in the leftproduct, gives:

O(R |~x , ~q) ∝∏

t:xt=qt=1

pt(1 − ut)

ut(1 − pt)·

t:qt=1

1 − pt

1 − ut

Schutze: Probabilistic Information Retrieval 21 / 36

Page 67: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

BIM retrieval status value

Including the query terms found in the document into theright product, but simultaneously dividing by them in the leftproduct, gives:

O(R |~x , ~q) ∝∏

t:xt=qt=1

pt(1 − ut)

ut(1 − pt)·

t:qt=1

1 − pt

1 − ut

The right product is now over all query terms, hence constantfor a particular query and can be ignored.

Schutze: Probabilistic Information Retrieval 21 / 36

Page 68: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

BIM retrieval status value

Including the query terms found in the document into theright product, but simultaneously dividing by them in the leftproduct, gives:

O(R |~x , ~q) ∝∏

t:xt=qt=1

pt(1 − ut)

ut(1 − pt)·

t:qt=1

1 − pt

1 − ut

The right product is now over all query terms, hence constantfor a particular query and can be ignored.

→ The only quantity that needs to be estimated to rankdocuments w.r.t a query is the left product.

Schutze: Probabilistic Information Retrieval 21 / 36

Page 69: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

BIM retrieval status value

Including the query terms found in the document into theright product, but simultaneously dividing by them in the leftproduct, gives:

O(R |~x , ~q) ∝∏

t:xt=qt=1

pt(1 − ut)

ut(1 − pt)·

t:qt=1

1 − pt

1 − ut

The right product is now over all query terms, hence constantfor a particular query and can be ignored.

→ The only quantity that needs to be estimated to rankdocuments w.r.t a query is the left product.

Hence the Retrieval Status Value (RSV) in this model:

RSVd = log∏

t:xt=qt=1

pt(1 − ut)

ut(1 − pt)=

t:xt=qt=1

logpt(1 − ut)

ut(1 − pt)

Schutze: Probabilistic Information Retrieval 21 / 36

Page 70: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

BIM retrieval status value (2)

Schutze: Probabilistic Information Retrieval 22 / 36

Page 71: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

BIM retrieval status value (2)

Equivalent: rank documents using the log odds ratios for the termsin the query ct :

ct = logpt(1 − ut)

ut(1 − pt)= log

pt

(1 − pt)− log

ut

1 − ut

The odds ratio is the ratio of two odds: (i) the odds of theterm appearing if the document is relevant (pt/(1 − pt)), and(ii) the odds of the term appearing if the document isnonrelevant (ut/(1 − ut))

Schutze: Probabilistic Information Retrieval 22 / 36

Page 72: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

BIM retrieval status value (2)

Equivalent: rank documents using the log odds ratios for the termsin the query ct :

ct = logpt(1 − ut)

ut(1 − pt)= log

pt

(1 − pt)− log

ut

1 − ut

The odds ratio is the ratio of two odds: (i) the odds of theterm appearing if the document is relevant (pt/(1 − pt)), and(ii) the odds of the term appearing if the document isnonrelevant (ut/(1 − ut))

ct = 0: term has equal odds of appearing in relevant andnonrelevant docs

Schutze: Probabilistic Information Retrieval 22 / 36

Page 73: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

BIM retrieval status value (2)

Equivalent: rank documents using the log odds ratios for the termsin the query ct :

ct = logpt(1 − ut)

ut(1 − pt)= log

pt

(1 − pt)− log

ut

1 − ut

The odds ratio is the ratio of two odds: (i) the odds of theterm appearing if the document is relevant (pt/(1 − pt)), and(ii) the odds of the term appearing if the document isnonrelevant (ut/(1 − ut))

ct = 0: term has equal odds of appearing in relevant andnonrelevant docs

ct positive: higher odds to appear in relevant documents

Schutze: Probabilistic Information Retrieval 22 / 36

Page 74: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

BIM retrieval status value (2)

Equivalent: rank documents using the log odds ratios for the termsin the query ct :

ct = logpt(1 − ut)

ut(1 − pt)= log

pt

(1 − pt)− log

ut

1 − ut

The odds ratio is the ratio of two odds: (i) the odds of theterm appearing if the document is relevant (pt/(1 − pt)), and(ii) the odds of the term appearing if the document isnonrelevant (ut/(1 − ut))

ct = 0: term has equal odds of appearing in relevant andnonrelevant docs

ct positive: higher odds to appear in relevant documents

ct negative: higher odds to appear in nonrelevantdocuments

Schutze: Probabilistic Information Retrieval 22 / 36

Page 75: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Term weight ct in BIM

ct = log pt

(1−pt )− log ut

1−utfunctions as a term weight.

Schutze: Probabilistic Information Retrieval 23 / 36

Page 76: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Term weight ct in BIM

ct = log pt

(1−pt )− log ut

1−utfunctions as a term weight.

Retrieval status value for document d : RSVd =∑

xt=qt=1 ct .

Schutze: Probabilistic Information Retrieval 23 / 36

Page 77: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Term weight ct in BIM

ct = log pt

(1−pt )− log ut

1−utfunctions as a term weight.

Retrieval status value for document d : RSVd =∑

xt=qt=1 ct .

So BIM and vector space model are similar on an operationallevel.

Schutze: Probabilistic Information Retrieval 23 / 36

Page 78: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Term weight ct in BIM

ct = log pt

(1−pt )− log ut

1−utfunctions as a term weight.

Retrieval status value for document d : RSVd =∑

xt=qt=1 ct .

So BIM and vector space model are similar on an operationallevel.

In particular: we can use the same data structures (invertedindex etc) for the two models.

Schutze: Probabilistic Information Retrieval 23 / 36

Page 79: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Computing term weights ct

For each term t in a query, estimate ct in the whole collectionusing a contingency table of counts of documents in the collection,where dft is the number of documents that contain term t:

documents relevant nonrelevant Total

Term present xt = 1 s dft − s dft

Term absent xt = 0 S − s (N − dft) − (S − s) N − dft

Total S N − S N

pt = s/S

ut = (dft − s)/(N − S)

ct = K (N,df t ,S , s) = logs/(S − s)

(dft − s)/((N − dft) − (S − s))

Schutze: Probabilistic Information Retrieval 24 / 36

Page 80: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Avoiding zeros

Schutze: Probabilistic Information Retrieval 25 / 36

Page 81: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Avoiding zeros

If any of the counts is a zero, then the term weight is notwell-defined.

Schutze: Probabilistic Information Retrieval 25 / 36

Page 82: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Avoiding zeros

If any of the counts is a zero, then the term weight is notwell-defined.

Maximum likelihood estimates do not work for rare events.

Schutze: Probabilistic Information Retrieval 25 / 36

Page 83: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Avoiding zeros

If any of the counts is a zero, then the term weight is notwell-defined.

Maximum likelihood estimates do not work for rare events.

To avoid zeros: add 0.5 to each count (expected likelihoodestimation = ELE) or use a different type of smoothing

Schutze: Probabilistic Information Retrieval 25 / 36

Page 84: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

More simplifying assumptions

Schutze: Probabilistic Information Retrieval 26 / 36

Page 85: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

More simplifying assumptions

Assume that relevant documents are a very small percentageof the collection . . .

Schutze: Probabilistic Information Retrieval 26 / 36

Page 86: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

More simplifying assumptions

Assume that relevant documents are a very small percentageof the collection . . .

. . . then we can approximate statistics for nonrelevantdocuments by statistics from the whole collection:

log[(1 − ut)/ut ] = log[(N − dft)/dft ] ≈ log N/df t

Schutze: Probabilistic Information Retrieval 26 / 36

Page 87: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

More simplifying assumptions

Assume that relevant documents are a very small percentageof the collection . . .

. . . then we can approximate statistics for nonrelevantdocuments by statistics from the whole collection:

log[(1 − ut)/ut ] = log[(N − dft)/dft ] ≈ log N/df t

This should look familiar to you . . .

Schutze: Probabilistic Information Retrieval 26 / 36

Page 88: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probability estimates in relevance feedback

Schutze: Probabilistic Information Retrieval 27 / 36

Page 89: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probability estimates in relevance feedback

For relevance feedback, we can directly compute term weightsct based on the contingency table (using an appropriatesmoothing method like ELE).

Schutze: Probabilistic Information Retrieval 27 / 36

Page 90: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Computing term weights ct for relevance feedback

For each term t in a query, estimate ct in the whole collectionusing a contingency table of counts of documents in the collection,where dft is the number of documents that contain term t:

documents relevant nonrelevant Total

Term present xt = 1 s dft − s dft

Term absent xt = 0 S − s (N − dft) − (S − s) N − dft

Total S N − S N

pt = s/S

ut = (dft − s)/(N − S)

ct = K (N,df t ,S , s) = logs/(S − s)

(dft − s)/((N − dft) − (S − s))

Schutze: Probabilistic Information Retrieval 28 / 36

Page 91: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probability estimates in adhoc retrieval

Schutze: Probabilistic Information Retrieval 29 / 36

Page 92: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probability estimates in adhoc retrieval

Ad-hoc retrieval: no user-supplied relevance judgmentsavailable

Schutze: Probabilistic Information Retrieval 29 / 36

Page 93: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probability estimates in adhoc retrieval

Ad-hoc retrieval: no user-supplied relevance judgmentsavailable

In this case: assume constant pt = 0.5 for all terms xt in thequery

Schutze: Probabilistic Information Retrieval 29 / 36

Page 94: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probability estimates in adhoc retrieval

Ad-hoc retrieval: no user-supplied relevance judgmentsavailable

In this case: assume constant pt = 0.5 for all terms xt in thequery

Each query term is equally likely to occur in a relevantdocument, and so the pt and (1 − pt) factors cancel out inthe expression for RSV.

Schutze: Probabilistic Information Retrieval 29 / 36

Page 95: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probability estimates in adhoc retrieval

Ad-hoc retrieval: no user-supplied relevance judgmentsavailable

In this case: assume constant pt = 0.5 for all terms xt in thequery

Each query term is equally likely to occur in a relevantdocument, and so the pt and (1 − pt) factors cancel out inthe expression for RSV.

Weak estimate, but doesn’t disagree violently withexpectation that query terms appear in many but not allrelevant documents.

Schutze: Probabilistic Information Retrieval 29 / 36

Page 96: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probability estimates in adhoc retrieval

Ad-hoc retrieval: no user-supplied relevance judgmentsavailable

In this case: assume constant pt = 0.5 for all terms xt in thequery

Each query term is equally likely to occur in a relevantdocument, and so the pt and (1 − pt) factors cancel out inthe expression for RSV.

Weak estimate, but doesn’t disagree violently withexpectation that query terms appear in many but not allrelevant documents.

Weight ct in this case: ct = log pt

(1−pt )− log ut

1−ut≈ log N/dft

Schutze: Probabilistic Information Retrieval 29 / 36

Page 97: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Probability estimates in adhoc retrieval

Ad-hoc retrieval: no user-supplied relevance judgmentsavailable

In this case: assume constant pt = 0.5 for all terms xt in thequery

Each query term is equally likely to occur in a relevantdocument, and so the pt and (1 − pt) factors cancel out inthe expression for RSV.

Weak estimate, but doesn’t disagree violently withexpectation that query terms appear in many but not allrelevant documents.

Weight ct in this case: ct = log pt

(1−pt )− log ut

1−ut≈ log N/dft

For short documents (titles or abstracts), this simple versionof BIM works well.

Schutze: Probabilistic Information Retrieval 29 / 36

Page 98: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Outline

1 Probabilistic Approach to IR

2 Binary independence model

3 Okapi BM25

Schutze: Probabilistic Information Retrieval 30 / 36

Page 99: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Okapi BM25: Overview

Okapi BM25 is a probabilistic model that incorporates termfrequency (i.e., it’s nonbinary) and length normalization.

Schutze: Probabilistic Information Retrieval 31 / 36

Page 100: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Okapi BM25: Overview

Okapi BM25 is a probabilistic model that incorporates termfrequency (i.e., it’s nonbinary) and length normalization.

BIM was originally designed for short catalog records of fairlyconsistent length, and it works reasonably in these contexts.

Schutze: Probabilistic Information Retrieval 31 / 36

Page 101: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Okapi BM25: Overview

Okapi BM25 is a probabilistic model that incorporates termfrequency (i.e., it’s nonbinary) and length normalization.

BIM was originally designed for short catalog records of fairlyconsistent length, and it works reasonably in these contexts.

For modern full-text search collections, a model should payattention to term frequency and document length.

Schutze: Probabilistic Information Retrieval 31 / 36

Page 102: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Okapi BM25: Overview

Okapi BM25 is a probabilistic model that incorporates termfrequency (i.e., it’s nonbinary) and length normalization.

BIM was originally designed for short catalog records of fairlyconsistent length, and it works reasonably in these contexts.

For modern full-text search collections, a model should payattention to term frequency and document length.

BM25 (BestMatch25) is sensitive to these quantities.

Schutze: Probabilistic Information Retrieval 31 / 36

Page 103: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Okapi BM25: Starting point

Schutze: Probabilistic Information Retrieval 32 / 36

Page 104: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Okapi BM25: Starting point

In the simplest version of BIM, the score for document d isjust idf weighting of the query terms present in the document:

Schutze: Probabilistic Information Retrieval 32 / 36

Page 105: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Okapi BM25: Starting point

In the simplest version of BIM, the score for document d isjust idf weighting of the query terms present in the document:

RSVd =∑

t∈q∩d

logN

dft

Schutze: Probabilistic Information Retrieval 32 / 36

Page 106: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Okapi BM25 basic weighting

Schutze: Probabilistic Information Retrieval 33 / 36

Page 107: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Okapi BM25 basic weighting

Improve idf term [log N/df] by factoring in term frequencyand document length.

RSVd =∑

t∈q

log

[

N

dft

]

·(k1 + 1)tf td

k1((1 − b) + b × (Ld/Lave)) + tftd

Schutze: Probabilistic Information Retrieval 33 / 36

Page 108: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Okapi BM25 basic weighting

Improve idf term [log N/df] by factoring in term frequencyand document length.

RSVd =∑

t∈q

log

[

N

dft

]

·(k1 + 1)tf td

k1((1 − b) + b × (Ld/Lave)) + tftd

tftd : term frequency in document d

Schutze: Probabilistic Information Retrieval 33 / 36

Page 109: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Okapi BM25 basic weighting

Improve idf term [log N/df] by factoring in term frequencyand document length.

RSVd =∑

t∈q

log

[

N

dft

]

·(k1 + 1)tf td

k1((1 − b) + b × (Ld/Lave)) + tftd

tftd : term frequency in document d

Ld (Lave): length of document d (average document length inthe whole collection)

Schutze: Probabilistic Information Retrieval 33 / 36

Page 110: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Okapi BM25 basic weighting

Improve idf term [log N/df] by factoring in term frequencyand document length.

RSVd =∑

t∈q

log

[

N

dft

]

·(k1 + 1)tf td

k1((1 − b) + b × (Ld/Lave)) + tftd

tftd : term frequency in document d

Ld (Lave): length of document d (average document length inthe whole collection)

k1: tuning parameter controlling scaling of term frequency

Schutze: Probabilistic Information Retrieval 33 / 36

Page 111: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Okapi BM25 basic weighting

Improve idf term [log N/df] by factoring in term frequencyand document length.

RSVd =∑

t∈q

log

[

N

dft

]

·(k1 + 1)tf td

k1((1 − b) + b × (Ld/Lave)) + tftd

tftd : term frequency in document d

Ld (Lave): length of document d (average document length inthe whole collection)

k1: tuning parameter controlling scaling of term frequency

b: tuning parameter controlling the scaling by documentlength

Schutze: Probabilistic Information Retrieval 33 / 36

Page 112: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Take-away

Probabilistic approach to IR: Introduction

Binary independence model or BIM – the first influentialprobabilistic model

Okapi BM25, a more modern, better performing probabilisticmodel

Schutze: Probabilistic Information Retrieval 34 / 36

Page 113: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Resources

Chapter 11 of Introduction to Information Retrieval

Resources at http://informationretrieval.org/essir2011

Binary independence model (original paper)More details on Okapi BM25Why the Naive Bayes independence assumption often works(paper)

Schutze: Probabilistic Information Retrieval 35 / 36

Page 114: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Exercise

Schutze: Probabilistic Information Retrieval 36 / 36

Page 115: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Exercise

Naive Bayes conditional independence assumption: the presence orabsence of a word in a document is independent of the presence orabsence of any other word (given the query).

Schutze: Probabilistic Information Retrieval 36 / 36

Page 116: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Exercise

Naive Bayes conditional independence assumption: the presence orabsence of a word in a document is independent of the presence orabsence of any other word (given the query).Why is this wrong? Good example?

Schutze: Probabilistic Information Retrieval 36 / 36

Page 117: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Exercise

Naive Bayes conditional independence assumption: the presence orabsence of a word in a document is independent of the presence orabsence of any other word (given the query).Why is this wrong? Good example?PRP assumes that the relevance of each document is independentof the relevance of other documents.

Schutze: Probabilistic Information Retrieval 36 / 36

Page 118: Introduction to Information Retrieval ` `%%%`# ` …nlp.stanford.edu/IR-book/essir2011/pdf/11prob.pdf · Probabilistic Approach to IR Binary independence model Okapi BM25 Introduction

Probabilistic Approach to IR Binary independence model Okapi BM25

Exercise

Naive Bayes conditional independence assumption: the presence orabsence of a word in a document is independent of the presence orabsence of any other word (given the query).Why is this wrong? Good example?PRP assumes that the relevance of each document is independentof the relevance of other documents.Why is this wrong? Good example?

Schutze: Probabilistic Information Retrieval 36 / 36


Recommended