MSc Projects – Information Searching
MSc ProjectsInformation Searching
Peter HancoxComputer Science
MSc Projects – Information Searching
Why should you be searching?
– saving you time by finding ways to solve problems, produce better designs, discover problem domains, benchmark your work;
Information searching/retrieval is about:
Introduction to information retrieval 1
produce better designs, discover problem domains, benchmark your work;
– learning from other people’s work;– developing problem-solving skills;– keeping your examiners happy.
MSc Projects – Information Searching
IR is more than Google
Most people’s way of finding information is to use Google.
Google is great for some things - mainly finding
What were the surnames of my grandfathers?
Is the Earth flat? See:http://www.alaska.net/~clund/e_djublonskopf/Flatearthsociety.htm
Introduction to information retrieval 2
Google is great for some things - mainly finding undisputed facts.
Google relies on indexing WWW pages - so it is as:– complete as the WWW;– accurate as the WWW.
MSc Projects – Information Searching
The Computer Science literature
3%7%
Conference papers
Journal articles
Technical reports
Introduction to information retrieval 3
39%
26%
5%
3%
17%Technical reports
Theses
Books
Other material (egprograms)
WWW pages
MSc Projects – Information Searching
How to know you’re getting quality
Journal papers- the best are “peer-reviewed”
How do you, a novice, know you’re reading high quality scientific/technical literature?
Introduction to information retrieval 4
Journal papers- the best are “peer-reviewed”Conference papers- the best are “peer-reviewed”Books- the best are published by the best publishers, e.g. Oxford, Cambridge, MIT, ….Technical reports- the best probably come from the best universities and companies …But how do you know which are the best?
MSc Projects – Information Searching
How to know you’re getting quality
The answer is very simple:
Use specialised information retrieval databases that have
Introduction to information retrieval 5
Use specialised information retrieval databases that have– excellentcoverage;– excellent currency.
MSc Projects – Information Searching
Some IR theory and practice - 1
There are three kinds of search:
– finding simple facts- use Google (with care)– current awareness- keeping yourself up-to-date
Introduction to information retrieval 6
– current awareness- keeping yourself up-to-date– retrospective searching- finding some (or all) the
literature on a topic
This lecture is mainly about retrospective searching.
MSc Projects – Information Searching
Some IR theory and practice - 2
A document set can be divided into relevant and irrelevant documents:
Introduction to information retrieval 7
MSc Projects – Information Searching
Some IR theory and practice - 3
A document set can be divided into relevant and irrelevant documents:
Precision =
Introduction to information retrieval 8
Precision = no. of relevant documentstotal no. of docs retrieved
100/160 = 62.5%
MSc Projects – Information Searching
Some IR theory and practice - 4
A document set can be divided into relevant and irrelevant documents:
Recall =
Introduction to information retrieval 9
Recall = no. of relevant documentstotal no. of relevant docs
100/200 = 50%
MSc Projects – Information Searching
Some IR theory and practice - 5
The paradox of searching? It seems impossible to get 100% precision and 100% recall.
Introduction to information retrieval 10
100% precision and 100% recall.
MSc Projects – Information Searching
Some IR theory and practice - 6
Bradford’s law of scattering:
Colloquially:To find all relevant scientific literature on a topic, you
Introduction to information retrieval 11
To find all relevant scientific literature on a topic, you have to look in all the literature; to find ~90% of the literature, you only have to look in 10% of the literature.
More formally:the returns of extending a search for references in science journals diminishes exponentially.
MSc Projects – Information Searching
Some IR theory and practice - 7
Bradford’s law of scattering:
Means that we can concentrate searching on a fairly small subset of the literature and get most results.
Introduction to information retrieval 12
subset of the literature and get most results.
Specialised information retrieval databases are designed to retrieve large amounts of literature from the optimum number of journals. Google isn’t designed to do this.
MSc Projects – Information Searching
Choosing databases - books
Don’t use Amazon - it only has books currently on sale that it can source.
Use a copyright deposit library:
Databases - books 13
Use a copyright deposit library:British LibraryLibrary of CongressCambridge UL
MSc Projects – Information Searching
Choosing databasesjournals and conference papers
The best keyword-based Computer Science services are:
Inspec
Databases - journals and conference papers 14
Inspechttp://www.engineeringvillage2.org
ACM Guide to Computing Literaturehttp://portal.acm.org/guide.cfm
MSc Projects – Information Searching
Choosing databasesjournals and conference papers
Interdisciplinary services with substantial Computing coverage:
Databases - journals and conference papers 15
Medlinehttp://gateway.ovid.com/autologin.html
Compendex/Engineering Indexhttp://www.engineeringvillage2.org
MSc Projects – Information Searching
Choosing databasesjournals and conference papers
Single publisher services - perhaps with full text access:
IEEE Xplore
Databases - journals and conference papers 16
IEEE Xplorehttp://ieeexplore.ieee.org/
MSc Projects – Information Searching
Inspec - coverage and currency
Includes:– 3,500 journals – many of them computing science
and applications journals– conference papers – 1,500 conferences added each
year
Searching Inspec 17
– conference papers – 1,500 conferences added each year
– seems to include reports, theses, etc, but how satisfactory is the coverage?
Journals seem to be completely indexed within ~6 months of publication.
MSc Projects – Information Searching
Inspec - indexing
How are the entries indexed?– Classification scheme– Controlled language
Searching Inspec 18
– Controlled language– Keywords
• taken from title• taken from abstract• written by the indexer
MSc Projects – Information Searching
Inspec - indexing
Searching Inspec 19
ti: coherenceti: inferenceti: representation
MSc Projects – Information Searching
Inspec - indexing
Searching Inspec 20
MSc Projects – Information Searching
Inspec - searching
Demonstration based on handout.
Searching Inspec 21
MSc Projects – Information Searching
Science Citation Index Subject coverage
The scope is so wide as to be multidisciplinary.
It indexes:
Searching Science Citation Index 22
It indexes:– journals - almost 5,300 science journals including at
least 200 computing journals and probably more.It doesn’t directly index:– conferences – books– reports – theses
Inspec indexes 3,500 mainly relevant journals
MSc Projects – Information Searching
Science Citation IndexComprehensiveness/coverage
Covers many of the principal journals in computing– has a wide computer science coverage, choosing the
Searching Science Citation Index 23
– has a wide computer science coverage, choosing the most widely respected journals rather than (e.g.) an engineering bias.
MSc Projects – Information Searching
Science Citation IndexSubject overlap
SCIoverlaps with several other indexing services.– Compendexhas many of the same core journals - but
also has conferences.
Searching Science Citation Index 24
also has conferences.– Inspechas many of the same core journals and lots of
other journals - and also has conferences.
MSc Projects – Information Searching
Science Citation IndexRecord content
How much information do the entries contain?– Basic bibliographic information– Abstract
Searching Science Citation Index 25
– Abstract– Institution - e.g. University of Birmingham– Language of original article
MSc Projects – Information Searching
Science Citation Index Indexing
How are the entries indexed?– Keywords
• taken from title
Searching Science Citation Index 26
• taken from title• taken from abstract• written by the indexer
– Citations
MSc Projects – Information Searching
SCI - searching
Demonstration based on handout.
Searching Science Citation Index 27
MSc Projects – Information Searching
So what does SCI retrieve?
If you use it as keyword-based indexOnly journals/serials
Searching Science Citation Index 28
If you search for citationsAnything that authors cite …– journal & conference papers, books, theses, technical
reports– letters, WWW pages, newspapers, conversations …
MSc Projects – Information Searching
Searching SCI for citations
Points to think aboutDoes the use of citations improve recall and/or precision?
Searching Science Citation Index 29
precision?
What criteria are used to include cited items? – Are items cited because they are relevant?– Because the author wrote them? – To criticize an alternative approach? – To impress readers with the author’s erudition?
MSc Projects – Information Searching
The End
30
The End