1© 2000
Searching the Hidden Searching the Hidden InternetInternet
When Search Engines When Search Engines Aren’t EnoughAren’t Enough
A Webcast Workshop
2© 2000
Sponsored by —Sponsored by —
Today’s Webcast Workshop
Produced by —Produced by —
Broadcast by Broadcast by ——
3© 2000
Karen HartmanKaren Hartmanandand
Ernest AckermannErnest Ackermann
Mary Washington CollegeMary Washington College
Today’s Presenters
4© 2000
Internet/Web Resourcescoauthored by Ernest Ackermann and Karen Hartman
5© 2000
Today’s Agenda
• What is the Hidden Internet?
• Why can’t popular search engines reach all the content on the Web?
• What types of information are “hidden”?
• How do we find this information?
6© 2000
What percentage of the publicly available Web is currently indexed by popular search engines?
A. About 2%
B. About 30%
C. Between 40-50%
D. More than 75%
Polling Interaction
7© 2000
What percentage of the publicly available Web is currently indexed by popular search engines?
A. About 2%
B. About 30%
C. Between 40-50%
D. More than 75%
Polling Interaction Answer
8© 2000
What is the Hidden Internet?
• The Hidden, or Invisible Web, is that part of the Internet not indexed in the major search engines (e.g., AltaVista, Northern Light, Google, and Lycos)
• This hidden content is usually contained in specialized databases that are linked to the Web
9© 2000
Specialized Databases Provide Faster, More Reliable Results
• Because these databases are smaller and contain selective information, the researcher can focus and limit a search by using particular criteria; such as date, language, location, etc.
• The precision of searching targeted content results in more relevant information
• Databases can provide very current information (often updated daily)
• They tend to be efficient, reliable, and comprehensive sources for research
10© 2000
Why popular search engines can’t reach all the content on the Web
• Content in databases that display pages dynamically in response to queries isn’t usually indexed by search engines (e.g., the Medline database is not searched by using HotBot)
• Search engines tend to avoid indexing URLs that have dynamic components in them (for example, the ? or “cgi”)
• Depth of indexing: search engine spiders are sometimes unable to index all pages of a site
• Indexing frequency: sometimes it can take months for a page to be indexed by a search engine, whereas a special database may update its pages daily
11© 2000
What’s something you have tried to search for on the Internet, but couldn’t ever find?
Elvis or the Holy Grail don’t count!
What’s something you have tried to search for on the Internet, but couldn’t ever find?
Elvis or the Holy Grail don’t count!
Message Interaction
12© 2000
What Types of Information May be Hidden from Search Engines?• Scholarly journal citations and abstracts
• Library catalogs
• Company financial reports
• Multimedia files
• Dynamically generated Web pages and interactive tools
• Content of Adobe PDF and other formatted files
• Digital collections
• Content in sites that require a login and registration process (including proprietary databases)
13© 2000
What are some databases and catalogs that you or your students use for research?
What are some databases and catalogs that you or your students use for research?
Message Interaction
14© 2000
Databases and Library Catalogs
• Medline (contains abstracts of the world’s premier medical literature)
• ERIC (Educational Resources Information Center)
• Hoover’s Online (company information)
• Library of Congress Online Catalog
15© 2000
Dynamically Generated Web Content
• Ticketmaster.comsearch for events, buying concert tickets
• Anywho.comsearch for a person’s telephone number or Email address
• http://www.bartleby.com/99/Bartlett’s Familiar Quotations
• Bankrate.comcalculate mortgage rates, CD rates, auto loans, etc.
16© 2000
Project Gutenberg is a project whereby volunteers choose and type entire books, creating them into e-texts. No books can be published that are still under copyright restriction. This means that you won’t find books published after what date?
A. 1890
B. 1800
C. 1923
D. 1905
Polling Interaction
17© 2000
From Project Gutenberg: “We cannot publish any texts still in copyright. This generally means that our texts are taken from books published pre-1923. It’s more complicated than that, as our Copyright Page explains, but 1923 is a good first rule-of-thumb for the U.S.A.”
A. 1890
B. 1800
C. 1923
D. 1905
Polling Interaction Answer
18© 2000
Digital Collections
• American Memory Collection
• UC Berkeley’s Sunsite
• The University of Virginia’s E-Text Collection
• Project Gutenberg
19© 2000
Sites Used to Search for Images, Audio, PDF Documents,
and Newsgroup Archives
• Images
AltaVista Image Search
• Audio
FindSounds.com
• PDF Documents
Search Adobe PDF Online
• Newsgroup Archives
Deja.com
20© 2000
Login and Register
at these Sites
• New York Times
• Thomas register
• BioMedNet Journal Collection
• Medical Matrix
• FastWeb
21© 2000
If you lived in Richmond, Virginia and needed a lawyer who spoke Spanish, which two tools listed below would be most helpful in your search?
A. Librarian’s Index to the Internet
B. HotBot
C. Google
D. The InvisibleWeb
Polling Interaction
22© 2000
If you lived in Richmond, Virginia and needed a lawyer who spoke Spanish, which two tools listed below would be most helpful in your search?
A. Librarian’s Index to the Internet
B. HotBot
C. Google
D. The InvisibleWeb
Polling Interaction Answer
23© 2000
How to Find Tools that Search the Hidden Internet
• CiteLine Professional
• Direct Search
• Intelliseek’s InvisibleWeb
• The Internet Public Library
• Librarian’s Index to the Internet
• Library Spot
• The Scout Report
• Webdata.com
24© 2000
Here’s an example
• You need a lawyer in Richmond, Virginia who specializes in bankruptcy and who also speaks Spanish
• Keywords to use: lawyer, Richmond, Virginia, bankruptcy, spanish
25© 2000
Try a popular search engine - AltaVista -
Type in …
lawyer richmond virginia bankruptcy spanish
26© 2000
Searching AltaVista without search features = 3.3 million
results
27© 2000
Try narrowing your search -use specific search features
• Place quotes around phrases
• Place + before necessary terms
• Type search expression like this:+lawyer +”richmond virginia” +bankruptcy +spanish
28© 2000
Searching AltaVista with search features = 32 results
(none appear to be relevant)
29© 2000
Try using a virtual library
Use the Librarian’s Index to the Internet to increase your chance of finding relevant sources.
30© 2000
Librarian’s Index to the Internet
31© 2000
Martindale-Hubbell Lawyer Locator:
www.martindale.com/locator
32© 2000
Martindale-Hubbell’s Search Form
33© 2000
Results = 2
34© 2000
Try another example
You need to find reliable statistics on how the states rank in the amount of people that don’t have health insurance.
35© 2000
Here’s what we tried
• Start with a virtual library (we used Library Spot)
• Look for a statistics category
• Go to a comprehensive statistics site
• We chose the University of Michigan
• Find health insurance category
• Search the databases listed
• We chose the PBS program “The Uninsured in America”
36© 2000
Where we found the statistics
Click on Mapping It Out: State Statistics
37© 2000
State Health Insurance Data (you must place the cursor over the state you are interested in)
38© 2000
Today’s Agenda
• What is the Hidden Internet?
• Why can’t popular search engines reach all the content on the Web?
• What types of information are “hidden”?
• How do we find this information?
39© 2000
Final Thoughts
40© 2000
Thank you for your participation!Thank you for your participation!
Textbook resources: Textbook resources: www.webliminal.com/erniewww.webliminal.com/ernieFranklin, Beedle & Associates: Franklin, Beedle & Associates: www.fbeedle.comwww.fbeedle.com
A Webcast Workshop