James Tam Computer Searches Concepts covered What is a search engine and how do they work? General...

Post on 20-Dec-2015

213 views 0 download


James Tam

Computer Searches

Concepts covered

•What is a search engine and how do they work?

•General search tips

•The Big Six search engines

•Other search tools

Much of these lecture notes were based on Search Engines for the World Wide Web by Alfred and Emily

James Tam

Looking for Information?

Start with the Internet• World wide web• Newsgroups• Archived mailing lists

There are potential problems

James Tam

Search Engines

What is a search engine

How do they work• Search engines may employ spiders

The Internet

James Tam

Search Engines (Continued)• Search engines may search human-created databases

James Tam

Making Your Web Site More Noticeable

Add relevant keywords (Spiders)

Search engine submission (“suggesting your site” to Humans)

James Tam

Keywords: The Secret To Effective Searches

Use keywords that are unique as possible

Run the search using a number of variations

Search only titles

Determine if the search engine is case sensitive• When searching for proper names, capitalize the first letters

Check your spelling

Re-run previous results

James Tam

Types Of Searches

Plain English




Near Searches

James Tam

Plain English Searches (Natural Language Searches)

Easy to formulate the query but may result in too many hits

James Tam

Plain English Searches (Continued)

Supported by almost all of the Big Six

AskJeeves (www.ask.com)

James Tam

AND SearchesTelling the search engine that it must include multiple keywords

Precede each keyword with a plus sign "+“ or “AND”

Some search engines use AND as the default, others do not

James Tam

OR Searches

Provides broader search results

Tells the search engine to include web pages that include at least one keyword out of a list of many (2+)

James Tam

NOT Searches

Precede the excluded keyword with a minus sign "-“ or “NOT”

James Tam

NEAR Searches

Tell the search Engine to show web pages where keywords appear near each other in the document (within 10 words)

James Tam

Using Wildcards "*"

Used to look for variations on particular words

Some search engines allow the wildcard to be placed at the beginning, middle or end of a keyword

Rules of thumb on the use of wildcards• Use them to find spelling variations• Use a minimum of three characters before the wildcard1

This will vary depending upon the particular search engine.

James Tam


Ignored by search engines because they are too common or are reserved for some special purpose• Common words• Reserved words

The search engine can be forced to include the stopwords• Use quotes• Use a plus sign

James Tam

Topic Directories

James Tam

Searching within a web page

James Tam

Opening A New Browser

James Tam

The Big Six

AltaVista (www.altavista.com)

Google (www.google.com)

HotBot (www.hotbot.com)

Lycos (www.lycos.com)

Northern Light (www.northernlight.com)

Yahoo (www.yahoo.com)

James Tam

Comparing The Big Six

Search Engine

Number of Web pages in database

Percentage of web in database1

Google 1.2 billion 57%

Yahoo (powered by Google)

1.2 billion 57%

Lycos 575 million 27%

HotBot 500 million 24%

AltaVista 350 million 17%

Northern Light

330 million 16%

1 Based upon figures from January 2001 and an estimate of 2 billion web pages in existence from www.searchenginewatch.com

Self-reported sizes

But size isn't everything!

James Tam


Types of searches• Logical OR• Date• Field• Geographic• Wildcards• Language• Case sensitive• Proximity• Weighted

Babel Fish

Obscure facts and figures

Dead links

James Tam

AltaVista (Continued)

Ranking of search results• Appearance in the title• Appearance near the beginning of the document• Links to related content

James Tam


Types of searches• Logical OR• Language• Domain• Type of file• Date• Not case sensitive• No wildcards• Specifies stopwords


Caches web pages

I feel lucky feature

James Tam

Google (Continued)

Ranking of search results• By the number of links

James Tam


Types of searches• Logical OR• Case sensitive • Wildcard searches• Language• Date• Domain• Geographic region• Link searches• Type of file• Must contain, should contain, should not contain

Graphical control of searches

James Tam

HotBot (Continued)

Ranking of search results• Having the keyword(s) in the title• Number of occurrences of the keyword

James Tam


Types of searches• Logical AND• Multi-media searches • Must include/should include, exclude• Link searches• No Stop words• Not case sensitive• No searches by date• No searches by wildcard

Kid's search site • www.lycoszone.com

James Tam

Lycos (Continued)

Ranking of search results• "Popularity" of site• Occurrences of keyword

James Tam

Northern Light

Types of searches• Logical AND• Special case sensitive search• Wildcard• Singular and plural• Stop words

WWW and a special database

Free search alerts

Customized search folders

James Tam

Northern Light (Continued)

Ranking of search results• By the number of links• Keyword frequency• Date of the document• Keyword appearing in title

James Tam


Searches• Logical OR• Date added• Wildcards• Not case-sensitive

Searches Yahoo directories

and Google database

Extensive classification

James Tam

Yahoo (Continued)

Ranking of search results• Results in Yahoo directory comes before Google results• Ranking in Yahoo directory determined by:

- The number of key words matched- Exact word matches- Location of the word in the web page

James Tam

Summary of The Big Six and What They Do Best1

AltaVista• Obscure facts and figures• Babel fish

Google• Big!• Often produces relevant search results• Caches web pages

HotBot• Multimedia• Ease of use

1 From Search Engines for the World Wide Web by Alfred and Emily

James Tam

Summary of The Big Six and What They Do Best (Continued)

Lycos• Multimedia• Kid's zone

Northern Light• Search on the web and special data bases

Yahoo• The most extensive web directory

James Tam

Metasearch engines

Search on multiple search engines automatically

Examples• www.metacrawler.com• www.dogpile.com• www.profusion.com• www.search.com• www.mamma.com

Drawbacks• Searches occur in the simplest form• Timeouts• Number of results returned

James Tam

Other (Task-Specific) Search Tools

Products• Amazon: www.amazon.com• CDNOW: www.cdnow.com• Consumer World: www.consumerworld.org• CNET Shareware.com: www.shareware.com• ZDNet: www.zdnet.com

Health• CDC: www.cdc.gov

James Tam

Other (Task-Specific) Search Tools (Continued)

Food• CuisineNet Menus Online: www.cuisinenet.com• Epicurious Food: www.epicurious.com• Martha Stewart: www.marthastewart.com

Miscellaneous• Expedia: www.expedia.com• Internet Movie Database: www.imdb.com• Monster: www.monster.ca• Workopolis: www.workopolis.com

James Tam

SummaryWhat is a search engine?

How do search engines gather information for their databases?

Types of Searches• By keyword• Logical• Plain English• Wildcards

Stopwords and searches.

Browsing topic directories.

What are the Big Six search engines?

Metasearch engines.

Task-specific search tools