Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 224 times |
Download: | 3 times |
WXGB6106INFORMATION RETRIEVAL
Week 3RETRIEVAL EVALUATION
INTRODUCTION
Evaluation necessary
Why evaluate ?
What to evaluate?
How to evaluate?
WHY EVALUATE
Need to know the advantages and disadvantages of using a particular IRS . The user should be be able to decide whether he / she wants to use an IRS based on evaluation results.
The user should also be able to decide whether it is cost-effective to use a particular IRS based on evaluation results.
WHAT TO EVALUATEWhat can be measured and should reflect the ability
of the IRS to satisfy user needs. Coverage of system – to what extent the IRS includes
relevant material Time lag – average interval between the time the user
query request is made and the time taken to obtain an answer set.
Form of presentation of output Effort involved on part of user in getting answers to his /
her query request. Recall of the IRS - % of relevant materials actually
retrieved in the answer to a query request. Precision of the IRS - % of retrieved material that is
actually relevant.
HOW TO EVALUATE?
Various methods available.
EVALUATION2 main processes in IR :
User query request/query request/ information query/query retrieval strategy / search request
Answer set / Hits
Need to know whether the documents retrieved in the answer set fulfills the user query request.Evaluation process known as retrieval performance evaluation.Evaluation is based on 2 main components : Test reference collection Evaluation measure.
EVALUATIONTest reference collection consists of :
A collection of documents A set of example information requests A set of relevant documents (provided by
specialists) for each information request
2 interrelated measures – RECALL and PRECISION
RETRIEVAL PERFORMANCE EVALUATION
Relevance Recall and Precision
Parameters defined : I = information requestR = set of relevant documents|R| = number of documents in this setA = document answer set retrieved by the information request|A| = number of documents in this set|Ra| = number of documents in the intersection of sets R and A
RETRIEVAL PERFORMANCE EVALUATION
Recall = fraction of the relevant documents (set R) which have been retrieved |Ra|
R = -----|R|
Precision = fraction of the retrieved documents (set A) which is relevant
|Ra|P = ----- |A|
Relevant docs in answer set
|Ra|
Relevant docs
|R|
Answer set
|A|
Collection
Precision and Recall for a given example information request
RETRIEVAL PERFORMANCE EVALUATION
Recall and precision are expressed as %
Sorted by degree of relevance or ranking.
User will see a ranked list.
RETRIEVAL PERFORMANCE EVALUATION
a. 10 documents in an IRS with a collection of 100 documents has been identified by specialists as being relevant to a particular query request - d3, d5, d9, d25, d39, d44, d56, d71, d89, d123
b. A query request was submitted and the following documents were retrieved and ranked according to relevance.
RETRIEVAL PERFORMANCE EVALUATION
1. d123*2. d843. d56*4. d65. d86. d9*7. d5118. d1299. d187
10.d25*11.d3812.d4813.d25014.d11315.d3*
RETRIEVAL PERFORMANCE EVALUATION
c. Only 5 documents retrieved (d123, d56, d9, d25, d3) are relevant to the query and matches the ones in (a).
d123 ranked 1st R=1/10 x 100% = 10%P=1/1 x 100% = 100%
d56 ranked 3rd R=2/10 x 100% = 20%P=2/3 x100% = 66%
d9 ranked 6th R=3/10 x 100% = 30%
P=3/6 x 100% = 50% d25 ranked 10th R=4/10 x 100% = 40%
P=4/10 x 100% = 40% d3 ranked 15th R=5/10 x 100% = 50%
P=5/15 x 100% = 33%
A = relevant documents = non-relevant documentsC = retrieved documentsĈ = not retrieved documentsN = total number of documents in the system
Relevant Non-relevantRetrieved A C Â CNot retrieved A Ĉ Â Ĉ
Contigency table
RETRIEVAL PERFORMANCE EVALUATION
Contingency tableN = 100A=10, Ā =90C=15, Ĉ =85
Relevant Non-Relevant
Retrieved 5 15-5=10
Not-Retrieved
10-5=5 100-10-10=80
Recall =5/10X100% = 50% , Precision = 5/15X100% = 33%
OTHER ALTERNATIVE MEASURES
Harmonic mean – a single measure which combines R & PE measures - a single measure which combines R & P, user specifies whether interested in R or PUser-oriented measures - based on a the user’s interpretation of which documents are relevant and which documents are not relevantExpected search lengthSatisfaction – focuses only on relevant docsFrustration – focuses on non-relevant docs
REFERENCE COLLECTIONExperimentations on IR done on test collection – eg. of test collection 1. Yearly conference known as TREC – Text Retrieval Conference Dedicated to experimentation with a large test collection of over 1 million documents, testing is time consuming.For each TREC conference, a set of reference experiments designed and research groups use these reference experiments to compare their IRS TREC NIST site – http://trec.nist.gov
REFERENCE COLLECTION
Collection known as TIPSTERTIPSTER/TREC test collectionCollection composed of DocumentsA set of example information requests ot topicsA set of relevant documents for each example information request
OTHER TEST COLLECTIONS
ADI – documents on information scienceCACM – computer scienceINSPEC – abstracts on electronics, computer and physicsISI – library scienceMedlars – medical articlesdeveloped by E.A. Fox for his PhD thesis at Cornell University, Ithaca, New York in 1883 – Extending the Boolean and vector space models of information retrieval with p-norm queries and multiple concept types – http://www.ncstrl.org