1
Search Engine Case Study
Presented by Alan, Aida, Jonathan & Stephen
Presented by Alan, Aida, Jonathan & Stephen
2
History
Incorporated in 1996 and based in Emeryville, California.
Created by: Garrett Gruener, venture capitalist and founder of Virtual Microsystems, and David Warthen, CTO and creator of Ask Jeeves' natural language technology.
In April 1997, Ask Jeeves launched the first of its advertiser-supported public Web sites, www.askjeeves.com
3
Web properties & Intntnl. sites
Web properties:– Ask Jeeves at Ask.com– Ask Jeeves for Kids at AJKids.com– DirectHit.com, and eTour.com
International Sites:– Pregunta.com (Español)– Ask.co.uk (United Kingdom)
4
Overview
What does it do?– combines natural-language parsing software, data mining
process, knowledge-base creation and maintenance tools with the strengths and capabilities of human editors.
What does that mean?– user-relevancy ranking algorithms rate Websites
according to how users interact with the content.– Ask Jeeves' editors and popularity technology captures
human judgments to provide useful, relevant information.
5
How does this work for the user?
Input from the user:– Pose questions in plain English.
Output from Jeeves:– Ask Jeeves editorially selected answers– Automated Search Results– Sponsored Links– Metasearch Results– Online Communities
6
Interesting Facts & Information
Tips on searching:– http://static.wc.ask.com/docs/help/help_searchtip
s.html Awards & Current news:
– http://static.wc.ask.com/docs/about/aboutawards.html
– http://static.wc.ask.com/docs/about/media.html
7
Further analysis of architecture
Jeeves’ three main return results sections:– Editorially selected answers– Subject-specific popularity results
(a.k.a automated search results)– Metasearch results
8
Editorially selected answers
The Askjeeves editorial staff have manually created an extensive knowledgebase of question/answer sets.
These question/answer sets are the first links that appear at the top of the results page. The links point to webpages that are thought to contain the exact answer to the question asked.
9
Editorial Standards
Editorially selected sites that contain ‘answers’ must adhere to the following:– load quickly & be polished, easy to read and easy
to navigate.– be well maintained and updated regularly.– offer accurate info that answers user query.– offer other links w/info related to user query.– demonstrate credibility by providing author &
source citations & contact info.
10
Subject-specific popularity results
AskJeeves acquired it’s AskJeeves acquired it’s Subject-specific popularity search process from Teoma Technologies. Their technology uses three methods to acquire meaningful results:– Individual web pages– Web pages grouped by topic– Expert links
AskJeeves only implements the first two search processes created by Teoma.
11
Implementation
Teoma's technology uses compact mathematical Teoma's technology uses compact mathematical modeling of the Web's structure to generate modeling of the Web's structure to generate dynamic queries. After searching using criteria such dynamic queries. After searching using criteria such as popularity and text analysis, it applies dynamic as popularity and text analysis, it applies dynamic topic clustering, subject-specific link analysis, and topic clustering, subject-specific link analysis, and expert identification. Dynamic topic clustering looks expert identification. Dynamic topic clustering looks at the Web from a local perspective, which enables at the Web from a local perspective, which enables Teoma to understand the subject matter of Web Teoma to understand the subject matter of Web pagespages
12
Metasearch results
AskJeeves gives users the ability to send their query to a number of other third party search engines. These search engines include:– Looksmart.com– About.com
13
AskJeeves vs. UNCA Library
VS.
14
UNCA Library (against Jeeves)
Strengths– ability to limit the language, location, & year of
results.– can search by author, subject, periodical title,
author/title, & call numbers. Weaknesses
– Library search shows where to find things, but doesn’t show the full text due to it being a printed medium.
15
AskJeeves (against Library)
Strengths– can use natural language queries.– offers alternative search terms.– offers a browseable subject index.– uses a spell checker.– has a msg. board where queries can be posted.
Weaknesses– unable to narrow search.
16
AskJeeves vs. Google
VS.
17
Google (against Jeeves) Strengths
– simple, stripped-down design.– fast– cache option.– ranking by authorities.– good navigation within search results.
Weaknesses– ranking by authority leaves out new/specialized
pages.– no page summary.
18
Jeeves (against Google)
Strengths– simple English queries for beginner users.– ‘answers’ query user w/questions to help search.– popularity technology makes common questions
faster & easier to answer.– search results include short summary of page.– search may be posted for other askjeeves users
to respond.
19
Jeeves (against Google) con.
Weakness– slow download.– distracting advertisements.– pay for top spots give dubious results.– no advanced search.– questions may not be in database.– bad navigation.– links displayed in askjeeves frame.
20
Engine ReviewsFAQ Pay
for Rank
Page Sum
Adult Filt.
Group by Topic
Case Phrase Search
Proximity Stem
AskJeeves
Yes Yes Yes Yes Yes No Yes No Yes
Google No No No No No Yes Yes No
UNCALibrary
No Yes No No No Yes Yes No
21
Engine ReviewsBoolean AND Exclude OR Wildchards
Ask
Jeeves
Yes No
Google Yes + - No
UNCA
Library
Yes AND AND Not OR Yes
+ -
22
Standard Query Comparison
Our four standardized queries were:– Q1: “Terrorism in the US”– Q2: “Radioactive waste disposal locations”– Q3: “History of Muzak”– Q4: “Federal Firearms Liscense Application
23
Q1: “Terrorism in the US”
Total Docs Returned
Relevant Docs Returned
Precision
Ask
Jeeves
~* 18 NA
Google 1,720,000 18 (out of 20) 0.90*
UNCA
Library
5 5 1.0
* Jeeves returns an indefinite amount of docs so precision is not possible. Also, precision for google is based on first 20 documents.
24
Q2: “Radioactive waste disposal locations”
Total Docs Returned
Relevant Docs Returned
Precision
Ask
Jeeves
~* 20 NA
Google 27,700 15 (out of 20) 0.75*UNCA
Library
14 12 0.857
* Jeeves returns an indefinite amount of docs so precision is not possible. Also, precision for google is based on first 20 documents.
25
Q3: “History of Muzak”
Total Docs Returned
Relevant Docs Returned
Precision
Ask
Jeeves
~* 2 NA
Google 8,730 11 (out of 20) 0.55*UNCA
Library
2 1 0.50
* Jeeves returns an indefinite amount of docs so precision is not possible. Also, precision for google is based on first 20 documents.
26
Q4: “Federal Firearms Liscense Application
Total Docs Returned
Relevant Docs Returned
Precision
Ask
Jeeves
~* 2 NA
Google 32,500 15 (out of 20) 0.75UNCA
Library
12 3 0.25
* Jeeves returns an indefinite amount of docs so precision is not possible. Also, precision for google is based on first 20 documents.
27
Conclusions Due to it’s design, if
you know what you are looking for and wish to receive a definative answer, askjeeves.com might be a good place to start.
Tip: For best results, ask your query as a specific question.
28
Conclusions If you desire a great deal of general information
regarding a specific topic, but do not know what questions to ask, Google is an excelent choice.
Tip: For best results, start broad and use the search within results feature to narrow the scope of your search.
29
Conclusions If you want more localized, relevant information on a
topic that has been hand picked by human experts in the field of categorization, the UNCA Library would be a good option.
Tip: For best results, use all features if searching via keword.
30
Sources
– http://www.infoworld.com/articles/hn/xml/02/01/07/020107hnjeeves.xml
– http://searchenginewatch.com/sereport/01/10-ask.html– http://www.searchenginewatch.com/sereport/01/07-
teoma.html– http://www.infotoday.com/newsbreaks/nb010820-2.htm– http://www.teoma.com/help.html– http://static.wc.ask.com/docs/about/policy.html– http://sp.ask.com/docs/about/whatisaskjeeves.html– http://wncln.appstate.edu/– http://www.google.com