9/21/2000Information Organization and Retrieval Ranking and Relevance Feedback Ray Larson & Marti...

transcript

9/21/2000 Information Organization and Retrieval

Ranking and Relevance Feedback

Ray Larson & Marti Hearst

University of California, Berkeley

School of Information Management and Systems

SIMS 202: Information Organization and Retrieval

Review

• Inverted files

• The Vector Space Model

• Term weighting

tf x idf)/log(* kikik nNtfw

Tcontain that in documents ofnumber the

collection in the documents ofnumber total

in T termoffrequency document inverse

document in T termoffrequency

document in term

Inverse Document Frequency

• IDF provides high values for rare words and low values for common words

10000log

698.220

10000log

301.05000

10000log

010000

10000log

For a collectionof 10000 documents

Similarity Measures

|)||,min(|

Simple matching (coordination level match)

Dice’s Coefficient

Jaccard’s Coefficient

Cosine Coefficient

Overlap Coefficient

Vector Space Visualization

Text ClusteringClustering is

“The art of finding groups in data.” -- Kaufmann and Rousseeu

Term 1

Term 2

tf x idf normalization• Normalize the term weights (so longer documents

are not unfairly given more weight)– normalize usually means force all values to fall within a certain

range, usually between 0 and 1, inclusive.

22 )]/[log()(

)/log(

Vector space similarity(use the weights to compare the documents)

terms.) thehting when weigdone tion was(Normaliza

product.inner normalizedor cosine, thecalled also is This

:is documents twoof similarity theNow,

kjkikji wwDDsim

Vector Space Similarity Measurecombine tf x idf into a similarity measure

:comparison similarity in the normalize otherwise

),( :normalized weights termif

absent is terma if 0 ...,,

,...,,

wwDQsim

Vector Space with Term Weights and Cosine Matching

0.80.60.40.20 1.0

Term B

Term A

Di=(di1,wdi1;di2, wdi2;…;dit, wdit)Q =(qi1,wqi1;qi2, wqi2;…;qit, wqit)

wwDQsim

)()(),(

Q = (0.4,0.8)D1=(0.8,0.3)D2=(0.2,0.7)

98.042.0

])7.0()2.0[(])8.0()4.0[(

)7.08.0()2.04.0()2,(

74.058.0

56.),( 1 DQsim

Term Weights in SMART

• In SMART weights are decomposed into three factors:

collectfreqw kkd

SMART Freq Components

)max(2

1)max(

freqfreq

Binary

maxnorm

augmented

Collection Weighting in SMART

DocNDocDoc

collect

Inverse

squared

probabilistic

frequency

Term Normalization in SMART

jvector

vectorj

cosine

fourth

Probabilistic Models: Logistic Regression attributes

Average Absolute Query Frequency

Query Length

Average Absolute Document Frequency

Document Length

Average Inverse Document Frequency

Inverse Document Frequency

Number of Terms in common between query and document -- logged

Probabilistic Models: Logistic Regression

10),|(

iii XccDQRP

Probability of relevance is based onLogistic regression from a sample set of documentsto determine values of the coefficients.At retrieval the probability estimate is obtained by:

For the 6 X attribute measures shown previously

Probabilistic Models

• Strong theoretical basis

• In principle should supply the best predictions of relevance given available information

• Can be implemented similarly to Vector

• Relevance information is required -- or is “guestimated”

• Important indicators of relevance may not be term -- though terms only are usually used

• Optimally requires on-going collection of relevance information

Advantages Disadvantages

Vector and Probabilistic Models

• Support “natural language” queries• Treat documents and queries the same• Support relevance feedback searching• Support ranked retrieval• Differ primarily in theoretical basis and in how the

ranking is calculated– Vector assumes relevance

– Probabilistic relies on relevance judgments or estimates

Current use of Probabilistic Models

• Virtually all the major systems in TREC now use the “Okapi BM25 formula” which incorporates the Robertson-Sparck Jones weights…

5.05.05.0

log)1(

rRnNrnrR

Okapi BM25

• Where:• Q is a query containing terms T• K is k1((1-b) + b.dl/avdl)• k1, b and k3 are parameters , usually 1.2, 0.75 and 7-1000• tf is the frequency of the term in a specific document• qtf is the frequency of the term in a topic from which Q was

derived• dl and avdl are the document length and the average

document length measured in some convenient unit• w(1) is the Robertson-Sparck Jones weight.

QT qtfk

31)1( )1()1(

• Logistic regression and Cheshire

• Relevance Feedback

Logistic Regression and Cheshire II

• The Cheshire II system uses Logistic Regression equations estimated from TREC full-text data.

• Demo (?)

Querying in IR SystemInterest profiles

& QueriesDocuments

& data

Rules of the game =Rules for subject indexing +

Thesaurus (which consists of

Lead-InVocabulary

andIndexing

Language

StorageLine

Potentially Relevant

Documents

Comparison/Matching

Store1: Profiles/Search requests

Store2: Documentrepresentations

Indexing (Descriptive and

Subject)

Formulating query in terms of

descriptors

Storage of profiles

Storage of Documents

Information Storage and Retrieval System

Relevance Feedback in an IR System

Interest profiles& Queries

Documents & data

Rules of the game =Rules for subject indexing +

Thesaurus (which consists of

Lead-InVocabulary

andIndexing

Language

StorageLine

Potentially Relevant

Documents

Comparison/Matching

Store1: Profiles/Search requests

Store2: Documentrepresentations

Indexing (Descriptive and

Subject)

Formulating query in terms of

descriptors

Storage of profiles

Storage of Documents

Information Storage and Retrieval System

Selected relevant docs

Query Modification• Problem: how to reformulate the query?

– Thesaurus expansion:• Suggest terms similar to query terms

– Relevance feedback:• Suggest terms (and documents) similar to retrieved documents

that have been judged to be relevant

Relevance Feedback• Main Idea:

– Modify existing query based on relevance judgements• Extract terms from relevant documents and add them to the

• and/or re-weight the terms already in the query

– Two main approaches:• Automatic (psuedo-relevance feedback)

• Users select relevant documents

– Users/system select terms from an automatically-generated list

Relevance Feedback

• Usually do both:– expand query with new terms

– re-weight terms in query

• There are many variations– usually positive weights for terms from relevant docs

– sometimes negative weights for terms from non-relevant docs

– Remove terms ONLY in non-relevant documents

Rocchio Method

0.25) to and 0.75 toset best to studies some(in

t termsnonrelevan andrelevant of importance the tune and

chosen documentsrelevant -non ofnumber the

chosen documentsrelevant ofnumber the

document relevant -non for the vector the

document relevant for the vector the

query initial for the vector the

1 21 101

Rocchio Method

0.25) to and 0.75 toset best to studies some(in

t termsnonrelevan andrelevant of importance thetune and ,

chosen documentsrelevant -non ofnumber the

chosen documentsrelevant ofnumber the

document relevant -non for the vector the

document relevant for the vector the

query initial for the vector the

121101

Rocchio/Vector Illustration

Retrieval

Information

0 0.5 1.0

Q0 = retrieval of information = (0.7,0.3)D1 = information science = (0.2,0.8)D2 = retrieval systems = (0.9,0.1)

Q’ = ½*Q0+ ½ * D1 = (0.45,0.55)Q” = ½*Q0+ ½ * D2 = (0.80,0.20)

Example Rocchio Calculation

)04.1,033.0,488.0,022.0,527.0,01.0,002.0,000875.0,011.0(

)950,.00.0,450,.00.0,500,.00.0,00.0,00.0,00.0(

)00.0,020,.00.0,025,.005,.00.0,020,.010,.030(.

)120,.100,.100,.025,.050,.002,.020,.009,.020(.

)120,.00.0,00.0,050,.025,.025,.00.0,00.0,030(.

Relevantdocs

Non-rel doc

Original Query

Constants

Rocchio Calculation

Resulting feedback query

Rocchio Method

• Rocchio automatically– re-weights terms

– adds in new terms (from relevant docs)• have to be careful when using negative terms

• Rocchio is not a machine learning algorithm

• Most methods perform similarly– results heavily dependent on test collection

• Machine learning methods are proving to work better than standard IR approaches like Rocchio

Probabilistic Relevance Feedback Robertson & Sparck Jones

Document Relevance

Documentindexing

Given a query term t

+ r n-r n

- R-r N-n-R+r N-n

R N-R N

Where N is the number of documents seen

Robertson-Spark Jones Weights

• Retrospective formulation --

rRnNrnrR

wnewt log

Robertson-Sparck Jones Weights

5.05.05.0

log)1(

rRnNrnrR

Predictive formulation

Using Relevance Feedback

• Known to improve results– in TREC-like conditions (no user involved)

• What about with a user in the loop?– How might you measure this?

Relevance Feedback Summary

• Iterative query modification can improve precision and recall for a standing query

• In at least one study, users were able to make good choices by seeing which terms were suggested for R.F. and selecting among them

Information Organization and Retrieval9/21/2000

Alternative Notions of Relevance Feedback

• Find people whose taste is “similar” to yours. Will you like what they like?

• Follow a users’ actions in the background. Can this be used to predict what the user will want to see next?

• Track what lots of people are doing. Does this implicitly indicate what they think is good and not good?

• Several different criteria to consider:– Implicit vs. Explicit judgements – Individual vs. Group judgements– Standing vs. Dynamic topics– Similarity of the items being judged vs. similarity

of the judges themselves

Information Organization and Retrieval

Collaborative Filtering (social filtering)

• If Pam liked the paper, I’ll like the paper

• If you liked Star Wars, you’ll like Independence Day

• Rating based on ratings of similar people– Ignores the text, so works on text, sound, pictures etc.

– But: Initial users can bias ratings of future users

Sally Bob Chris Lynn KarenStar Wars 7 7 3 4 7Jurassic Park 6 4 7 4 4Terminator II 3 4 7 6 3Independence Day 7 7 2 2 ?

• Users rate musical artists from like to dislike– 1 = detest 7 = can’t live without 4 = ambivalent

– There is a normal distribution around 4

– However, what matters are the extremes

• Nearest Neighbors Strategy: Find similar users and predicted (weighted) average of user ratings

• Pearson r algorithm: weight by degree of correlation between user U and user J– 1 means very similar, 0 means no correlation, -1 dissimilar

– Works better to compare against the ambivalent rating (4), rather than the individual’s average score

Ringo Collaborative Filtering (Shardanand & Maes 95)

22 )()(

JJUUrUJ

Social Filtering• Ignores the content, only looks at who judges

things similarly• Works well on data relating to “taste”

– something that people are good at predicting about each other too

• Does it work for topic? – GroupLens results suggest otherwise (preliminary)– Perhaps for quality assessments– What about for assessing if a document is about a

topic?

Learning interface agents• Add agents in the UI, delegate tasks to them

• Use machine learning to improve performance– learn user behavior, preferences

• Useful when:– 1) past behavior is a useful predictor of the future

– 2) wide variety of behaviors amongst users

• Examples: – mail clerk: sort incoming messages in right mailboxes

– calendar manager: automatically schedule meeting times?

Summary

• Relevance feedback is an effective means for user-directed query modification.

• Modification can be done with either direct or indirect user input

• Modification can be done based on an individual’s or a group’s past input.