Date post: | 06-Apr-2018 |
Category: |
Documents |
Upload: | zhou-xuanming |
View: | 218 times |
Download: | 0 times |
of 15
8/3/2019 03 Search Engine
1/15
I C 0 1 0 2 W e b - b a se d I n f o r m a t i o n Sy s t em s
W eb Search Eng ines
A/ Prof. Yang Zhonghua
Web Search Eng ine / 2 ( 59 ) A/Prof. Yang, Zhonghua
Web Search Engines
Web Search Eng ine / 3 ( 59 ) A/Prof. Yang, Zhonghua
The first law of e-commerce is that if users
cannot find the product, they cannot buy iteither.
Jakob Nielsen
Web Search Eng ine / 4 ( 59 ) A/Prof. Yang, Zhonghua
I n t e rne t sea rch eng i nes
Internet search engines are special sites onthe Web that are designed to help peoplefind information stored on other sites.
There are differences in the ways various
search engines work, but they all performthree basic tasks:1. They search the Internet -- or select pieces of the
Internet -- based on important words.
2. They keep an index of the words they find, andwhere they find them.
3. They allow users to look for words orcombinations of words found in that index.
8/3/2019 03 Search Engine
2/15
Web Search Eng ine / 5 ( 59 ) A/Prof. Yang, Zhonghua
Google search is the world's most popular search engine
Web Search Eng ine / 6 ( 59 ) A/Prof. Yang, Zhonghua
SearchesPer Day(Millions)
Per Month(Millions)
91 2,733
Yahoo 60 1,792
MSN 28 845AOL 16 486
Ask 13 378
Others 6 166
Total 213 6,400
Searches Per Day: Top 5 Eng ines
the United States in March 2006
Web Search Eng ine / 7 ( 59 ) A/Prof. Yang, Zhonghua
Top U.S. Search P roviders by Searches, M ay 2007
Provider Searches (000) Share of Total Searches (% )
Google 4,033,277 56.3
Yahoo 1,540,949 21.5
MSN/ Windows Live 605,400 8.4
AOL 381,961 5.3
Ask.com 142,418 2.0
My Web Search 61,784 0.9
Comcast 34,908 0.5
EarthLink 33,461 0.5
My Way 30,122 0.4
Dogpile.com 26,295 0.4
Other 275,365 3.8All search 7,165,940 100.0
Source: Nielsen// NetRatings, 2007
Web Search Eng ine / 8 ( 59 ) A/Prof. Yang, Zhonghua
Google search eng ine
Google has one of the largest databases ofWeb pages, including many other types ofweb documents (blog posts, w iki pages, groupdiscussion threads and document formats
(e.g., PDFs, Word or Excel documents,PowerPoints).
Despite the presence of all these formats,Google's popularity ranking often makespages worth looking at rise near the top ofsearch results.
8/3/2019 03 Search Engine
3/15
Web Search Eng ine / 9 ( 59 ) A/Prof. Yang, Zhonghua
Second Op in ion i n search ing
Google alone is often not sufficient, however.Less than halfthe searchable Web is fullysearchable in Google.
Overlap studies show that about halfof thepages in any search engine database existonly in that database.
Getting a second opinion is therefore oftenworth your time.
Ask.com or Yahoo! Search.
Web Search Eng ine / 1 0 ( 59 ) A/Prof. Yang, Zhonghua
Things You CAN Do
in Google, Yahoo !, and Ask .com
Things NOT Supported
in Google, Yahoo !, or Ask .com Phrase Searching by enclosing
terms in double quotes
OR searching with capitalized OR
- excludes, + requires exact formof word
Limit results by language inAdvanced Search
Truncation - use OR searches
for variants (airline ORairlines)
Case sensitivity capitalizationdoes
Fea tu res i n comm on
Web Search Eng ine / 1 1 ( 5 9) A/Prof. Yang, Zhonghua
Featu r es the search eng ines d i f fe r
Search EngineGoogle
www.google.com
Yahoo! Searchsearch.yahoo.com
Ask.comwww.ask.com
L inks to help Goog le help pages Yahoo ! help pages Ask help pages
Size, type
HUGE. Size not
disclosed in any waythat allows comparison.Probab ly the biggest.
HUGE. Claims over20 billion total"w eb objects."
LARGE. Claims tohave 2 billion fullyindexed,searchable pages.
Noteworthyfeatures andlimitations
Popularity rankingusing PageRank.Indexes the first 101KBof a Web page, and120KB of PDF's.~ before a word finds
synonyms sometimes(~help > FAQ, tutorial,etc.)
Shortcuts givequick access todictionary,synonyms, patents,traffic, stocks,
encyclopedia, andmore.
Subject-SpecificPopularityranking.Suggests broader
and narrowerterms.
Web Search Eng ine / 1 2 ( 59 ) A/Prof. Yang, Zhonghua
Featu r es the search eng ines d i f fe r
Search EngineGoogle
www.google.com
Yahoo! Searchsearch.yahoo.com
Ask.comwww.ask.com
Boolean logic
Partial. AND assumedbetween words.Capitalize OR.- excludes.
No ( ) or nesting.In Advanced Search,partial Booleanavailable in boxes.
Accepts AND, OR,NOT or AND NOT,and ( ). Mus t be cap i ta l i zed .
You must encloseterms joined by ORin parentheses(classic Boolean).
Partial. ANDassumed betweenwords.
Capitalize OR.- excludes.No ( ) or nesting.
+Requires/ -Excludes
- excludes+ will allow you toretrieve "stop words"(e.g., +in)
- excludes+ will allow you tosearch commonwo rds: "+in truth"
- excludes+ will allow you toretrieve "stopwords" (e.g., +in)
Sub-Searching
Sort of . At bottom ofresults page, click"Search withinresults" and entermore terms. Addsterms.
Add terms. Sort of . Add terms.
8/3/2019 03 Search Engine
4/15
Web Search Eng ine / 1 3 ( 5 9) A/Prof. Yang, Zhonghua
Featu r es the search eng ines d i f fe r
Search EngineGoogle
www.google.comYahoo! Search
search.yahoo.comAsk.com
www.ask.com
Results Ranking
Based on pagepopularity measuredin links to it fromother pages: highrank if a lot of otherpages link to it.Fuzzy AND alsoinvoked.Matching and rankingbased on "cached"version of pages thatmay not be the mostrecent version.
Automatic FuzzyAND.
Based on Subject-SpecificPopular ity, linksto a page byrelated pages.
Web Search Eng ine / 1 4 ( 59 ) A/Prof. Yang, Zhonghua
Featu r es the search eng ines d i f fe r
Search EngineGoogle
www.google.comYahoo! Search
search.yahoo.comAsk.com
www.ask.com
Field limiting
link:site:intitle:inurl:Offers U.S.Gov'tSearch and otherspecial searches.Patent search.
link:site:intitle:inurl:url:hostname:
intitle:inurl:site:
Truncation
Stemming
No truncation. Stemssome words. Searchvariant endings and
synonyms separately,separating with OR(capitalized):a i r l ine OR a i r l ines
Neither. Search
with OR as inGoogle.
Neither. Search
with OR as inGoogle.
Web Search Eng ine / 1 5 ( 5 9) A/Prof. Yang, Zhonghua
Featu r es the search eng ines d i f fe r
Search EngineGoogle
www.google.comYahoo! Search
search.yahoo.comAsk.com
www.ask.com
Language
Yes. Major Romanizedand non-Romanizedlanguages inAdvanced Search.
Yes. MajorRomanized andnon-Romanizedlanguages.
Yes. MajorRomanizedlanguages. UseAdvanced Search to
limit.
Translation
Yes, in Translate thispage link fol lowingsome pages. To andsometimes fromEnglish and majorEuropean languagesand Chinese,Japanese, Korean.
Yes. No.
Web Search Eng ine / 1 6 ( 59 ) A/Prof. Yang, Zhonghua
Role o f search eng ines fo r e -com m erce
80% of traffic determined by search
60% would use search to research a purchase
67% would choose a natural search result
Examples (each month in the UK):
500,000 search for shopp ing
100,000 for clothes, shirts & shoes
1,000,000 for mobile phone
250,000 for furniture
25,000 for bed linen
8/3/2019 03 Search Engine
5/15
Web Search Eng ine / 1 7 ( 5 9) A/Prof. Yang, Zhonghua
Wh y i t m a t t e r s f i nanc ia l ly
what the term "natural" or "organic" search engine-listing means, they describe the "editorial" searchresults on any particular engine. These results are professed to be non-biased - meaning that the enginewill not accept money to influence the rankings of any individual sites.
Shopper enters search.About 10,000 peoplelooked for ski jacketsin Dec 2004
Natural/organic searchresults: over 79,000pages in the UK
Analysis suggests roughly 30%of searchers will click a topthree result, another 20% onrest of page one (top ten).
Paid for search results at a costof 0.62p per click for this
keyword. Rough click-throughrate of 5-10%
Web Search Eng ine / 1 8 ( 59 ) A/Prof. Yang, Zhonghua
Wh y i t m a t t e r s f i nanci a ll y
Search termNumber of
searchesCl ic k thrus V is it or s Conv . r at io Order s
Valuep.m.
Value p.a.
DVD player 300,000 30% 90,000 0.05% 45 20,205 242,460
Sony DVD player 10,000 30% 3,000 1% 30 13,470 161,640
Sony RDR-GX7 1,000 30% 300 5% 15 6,735 80,820
Assumes a top three search resultand a purchase price of 449.
Web Search Eng ine / 1 9 ( 5 9) A/Prof. Yang, Zhonghua
All products and categories appear in Google & other major searchengines: completely & elegantly
Sites perform well for generic searches
Wh at s t h e goa l?
55.8%
9.6%
3.8%
2.6%2.2%1.5%
21.7%
2.8%
Yahoo
M SN
AOL
Lycos
Altavista
Ask Jeeves
Others
Web Search Eng ine / 2 0 ( 59 ) A/Prof. Yang, Zhonghua
How do you com e t op o f Goog l e?
1) Indexability
2) Relevance
3) Link popularity
8/3/2019 03 Search Engine
6/15
Web Search Eng ine / 2 1 ( 5 9) A/Prof. Yang, Zhonghua
I n d e x a b i l i t y
The site must be navigated by robots andspiders
I ts content must be readable
Robots dont like frames
Robots dont like Flash
Robots cant read into product catalogues
Web Search Eng ine / 2 2 ( 59 ) A/Prof. Yang, Zhonghua
page titles
URLs
links
image names
body copy
Wh at robo t s read and i ndex
Web Search Eng ine / 2 3 ( 5 9) A/Prof. Yang, Zhonghua
description meta tag
w h a t r o b o t s r e a d an d i n d e x
Web Search Eng ine / 2 4 ( 59 ) A/Prof. Yang, Zhonghua
Relevance
The content of your site must be relevant
It must reflect the keywords
Keywords are the words or phrases that webusers use to search for information on the
web
Where and how you place and present thesekeywords in your site is vital
8/3/2019 03 Search Engine
7/15
Web Search Eng ine / 2 5 ( 5 9) A/Prof. Yang, Zhonghua
c h oo si n g k e y w o r d s
Keywords reflect:
Your core business/ product/ service offering
Your unique sales/ value proposition
What your customers are looking for on theInternet
Other influencing factors
Popularity
Saturation
Relevance
Priorities (& quantity)
Web Search Eng ine / 2 6 ( 59 ) A/Prof. Yang, Zhonghua
W h e r e t o p u t k e y w o r d s
Page title (the single most important place)
Description meta tag (appears in listings)
Body headers (H1) and copy
Image/ file names
Image alt tags
URLs
Keywords meta tag
and Offsite descriptions (directories etc)
Web Search Eng ine / 2 7 ( 5 9) A/Prof. Yang, Zhonghua
W h a t m a t t e r s
Meta tags
Description (pay attention to size)
Page title (pay attention to size)
Category page (make it relevant)
Product page (make it relevant)
Offsite relevance (d irectories, links)
Web Search Eng ine / 2 8 ( 59 ) A/Prof. Yang, Zhonghua
Popu la r i t y
Determined primarily by number of inbound &relevant links
Influenced by frequency and recency ofupdates
Visible in Googles Page Rank
8/3/2019 03 Search Engine
8/15
Web Search Eng ine / 2 9 ( 5 9) A/Prof. Yang, Zhonghua
H o w d o I i n cr e as e p o p u l ar i t y ?
Get lots of people to link to your site (w ith theright keywords)
Common approaches:
Get in the important directoriesSelf-managed affiliate programmes
Develop valuable content
Research, surveys and quizzes
Weblogs (blogs)
Social bookmarks (del.icio.us)
Web Search Eng ine / 3 0 ( 59 ) A/Prof. Yang, Zhonghua
t h e t w o m o s t i m p o r t a n t l i n k s
Open directory Project Yahoo Directory
Web Search Eng ine / 3 1 ( 5 9) A/Prof. Yang, Zhonghua
How do Search Engines Work?
Web Search Eng ine / 3 2 ( 59 ) A/Prof. Yang, Zhonghua
Search ing a da t abase
Search Engines for the general web (like allthose listed above) do not really search theWorld Wide Web directly.
Each one searches a database of the full textof web pages selected from the billions of webpages out there residing on servers.When you search the web using a search engine,
you are always searching a somewhat stale copy ofthe real web page.
When you click on links provided in a searchengine's search results, you retrieve from theserver the current version of the page.
8/3/2019 03 Search Engine
9/15
Web Search Eng ine / 3 3 ( 5 9) A/Prof. Yang, Zhonghua
Robot s : Sp ider
Search engine databases are selected andbuilt by computer robot programs calledspiders (Web craw ler).
Although it is said they " crawl" the w eb in theirhunt for pages to include, in truth they stay in oneplace.
They find the pages for potential inclusion byfollowing the links in the pages they already havein their database (i.e., already "know about") .
They cannot think or type a URL or use judgment
to "decide" to go look something up and see what'son the web about it.
Web Search Eng ine / 3 4 ( 59 ) A/Prof. Yang, Zhonghua
Page L inks sub m ission
If a web page is never linked to in any otherpage, search engine spiders cannot find it.
The only way a brand new page - one that no
other page has ever linked to - can get into asearch engine is for its URL to be sent bysome human to the search engine companiesas a request that the new page be included.
All search engine companies offer ways to do this.
Web Search Eng ine / 3 5 ( 5 9) A/Prof. Yang, Zhonghua
I n d e x i n g
After spiders find pages, they pass them on toanother computer program for " indexing."
This program identifies the text, links, and othercontent in the page and stores it in the searchengine database's files
so that the database can be searched by keywordand whatever more advanced approaches areoffered, and the page w ill be found if your searchmatches its content.
Web Search Eng ine / 3 6 ( 59 ) A/Prof. Yang, Zhonghua
"Spiders" take a Web page'scontent and create key searchwords that enable online usersto find pages they're lookingfor.
8/3/2019 03 Search Engine
10/15
Web Search Eng ine / 3 7 ( 5 9) A/Prof. Yang, Zhonghua
W h a t t o l oo k
When the Google spider looked at an HTMLpage, it took note oftwo things:The words w ithin the page
Where the words w ere found
Words occurring in the title, subtitles, metatags and other positions of relativeimportance were noted for specialconsideration during a subsequent usersearch.The Google spider was built to index every
significant word on a page, leaving out the articles"a," "an" and "the."
Other spiders take different approaches.
Web Search Eng ine / 3 8 ( 59 ) A/Prof. Yang, Zhonghua
Meta Tags
Meta tags allow the owner of a page to specifykey words and concepts under which the pagew ill be indexed.
There is, however, a danger in over-reliance onmeta tags, because a careless or unscrupulouspage owner might add meta tags that fit verypopular topics but have nothing to do w ith theactual contents of the page.
To protect against this, spiders w ill correlate metatags with page content, rejecting the meta tagsthat don't match the words on the page.
Web Search Eng ine / 3 9 ( 5 9) A/Prof. Yang, Zhonghua
Meta tag ( NTU)
Web Search Eng ine / 4 0 ( 59 ) A/Prof. Yang, Zhonghua
The Met a Descr ip t i on Tag
The meta description tag allows you toinfluence the description of your page in thecraw lers that support the tag
But Google ignores the meta description tag and
instead will automatically generate its owndescription for this page
8/3/2019 03 Search Engine
11/15
Web Search Eng ine / 4 1 ( 5 9) A/Prof. Yang, Zhonghua
Met a Robot s Tag
The robots tag lets you specify that aparticular page should NOT be indexed by asearch engine.
Web Search Eng ine / 4 2 ( 59 ) A/Prof. Yang, Zhonghua
I n d e x i n g : w e ig h t
To make for more useful results, most searchengines store more than just the word andURL.
An engine might store the number of timesthat the word appears on a page.
The engine might assign a weight to eachentry, w ith increasing values assigned towords as they appear near the top of thedocument, in sub-headings, in links, in themeta tags or in the title of the page.
Each commercial search engine has a differentformula for assigning weight to the words in itsindex.
Web Search Eng ine / 4 3 ( 5 9) A/Prof. Yang, Zhonghua
How Search Eng ines Rank W eb Pages
How do crawler-based search engines goabout determining relevancy follow a set ofrules, know n as an algorithm.
Exactly how a particular search engine's algorithm
works is a closely-kept trade secret. How ever, all major search engines follow the
general rules below.
Web Search Eng ine / 4 4 ( 59 ) A/Prof. Yang, Zhonghua
How Search Eng ines Rank Web Pages
One of the main rules in a ranking algorithminvolves the location and frequency ofkeywords on a web page. Call it the location /frequency method, for short.Pages w ith the search terms appearing in the HTML
title tag are often assumed to be more relevantthan others to the topic.
Search engines w ill also check to see if the searchkeywords appear near the top of a web page,
Frequency is the other major factor in how searchengines determine relevancy. A search engine w illanalyze how often keywords appear in relation toother words in a web page
8/3/2019 03 Search Engine
12/15
Web Search Eng ine / 4 5 ( 5 9) A/Prof. Yang, Zhonghua
How Search Eng in es Rank Web Pages
"off the page" ranking criteria.
Off the page factors are those that a webmasterscannot easily influence. Chief among these is linkanalysis.
By analyzing how pages link to each other, asearch engine can both determine what apage is about and whether that page isdeemed to be " important"
Web Search Eng ine / 4 6 ( 59 ) A/Prof. Yang, Zhonghua
How Search Eng in es Rank Web Pages
In addition, sophisticated techniques are usedto screen out attempts by webmasters to build"artificial" links designed to boost their
rankings. Another off the page factor is click through
measurement.
a search engine may watch what results someoneselects for a particular search, then eventuallydrop high-ranking pages that aren't attractingclicks, while promoting lower-ranking pages that
do pull in visitors.
Web Search Eng ine / 4 7 ( 5 9) A/Prof. Yang, Zhonghua
Pl acemen t Ti ps fo r m os t " re levan t "
Pick Your Target KeywordsHow do you think people w ill search for your web
page? The words you imagine them typing into thesearch box are your target keywords.
Your target keywords should always be at least
two or more words long. Position Your KeywordsMake sure your target keywords appear in the
crucial locations on your web pages. The page'sHTML title tag is most important.
Build your titles around the top two or threephrases that you would like the page to be foundfor.
Web Search Eng ine / 4 8 ( 59 ) A/Prof. Yang, Zhonghua
Creat e Relevant Cont ent
Your keywords need to be reflected in thepage content.
consider "expanding" your text references,where appropriate.
For example, a stamp collecting page might havereferences to "collectors" and " collecting."Expanding these references to "stamp collectors"and "stamp collecting" reinforces your strategickeywords in a legitimate and natural manner.
8/3/2019 03 Search Engine
13/15
Web Search Eng ine / 4 9 ( 5 9) A/Prof. Yang, Zhonghua
Bu i l d I n b o u n d L in k s
Every major search engine uses link analysisas part of its ranking algorithm.
By building links, you can help improve how
well your pages perform in link analysissystems.
You w ant links from good web pages that arerelated to the topics you want to be found for.
Web Search Eng ine / 5 0 ( 59 ) A/Prof. Yang, Zhonghua
Bu i l d I n b o u n d L in k s
Here's one simple means to find those goodlinks.
Using a search engine, search for your target
keywords. Look at the pages that appear in the topresults.
Now visit those pages and ask the site owners ifthey will link to you. Not everyone w ill, especiallysites that are extremely competitive with yours.
Web Search Eng ine / 5 1 ( 5 9) A/Prof. Yang, Zhonghua
Subm i t Your Key Pages
Most search engines will index the otherpages from your web site by follow ing linksfrom a page you submit to them.
submit the top two or three pages that best
summarize your web site. Verify and Maintain Your Listing
Web Search Eng ine / 5 2 ( 59 ) A/Prof. Yang, Zhonghua
8/3/2019 03 Search Engine
14/15
Web Search Eng ine / 5 3 ( 5 9) A/Prof. Yang, Zhonghua
I nv i s i b le Web pages
Some types of pages and links are excludedfrom most search engines by policy.
Others are excluded because search engine
spiders cannot access them. Pages that are excluded are referred to as the
"Invisible Web
what you don't see in search engine results.
Web Search Eng ine / 5 4 ( 59 ) A/Prof. Yang, Zhonghua
Subm i t t i ng To D i recto r i es
Submitting To Directories: Yahoo & The OpenDirectory
The Open Directory Project (aka ODP or
DMOZ) is a volunteer-built guide to the w eb. It is provided as an option at many major searchengines, including Google. Given this, being listedwith the Open Directory can add value to any site.
Submission is absolutely free.
Yahoo maintains its own independent"directory" of Web sitesAnyone can use Standard submission to submit for
free to a non-commercial category.
dmoz (from directory.mozilla.org, ODP's original domain name)
Web Search Eng ine / 5 5 ( 5 9) A/Prof. Yang, Zhonghua
Paid Search Adver t i s i ng
Paid Search Advertising: Google AdWords,Yahoo Search Marketing & M icrosoft adCenter
Every major search engine with significantmarket share accepts paid listings.
This unique form of search engine advertisingguarantees that your site will appear in the topresults for the keyword terms you target w ithin aday or less.
Paid search listings are also called sponsoredlistings and/ or Pay Per Click (PPC) listings.
Web Search Eng ine / 5 6 ( 59 ) A/Prof. Yang, Zhonghua
W hat Makes a Search Eng ine Good?
Parts of SearchEngines
Variables, and their implications for your searches
Database of webdocuments
Size of database: How many documents does the searchengine claim it has? How much of the total w eb are you ableto search? Freshness (" up-to-dateness"): Search enginedatabases consist of copies of web pages and o therdocuments that were made when their craw lers or spiderslast visited each site. How often is the database refreshed tofind new pages? How often do their crawlers update thecopies of the web pages you ar e searching? Completeness oftext: Is the database really "full" text, or only parts of thepages? Is every word indexed? Types of documents offered:All search engines offer w eb pages. Do they also haveextensive PDF, Word, Excel, Pow erPoint, and other formatslike WordP erfect? Are they full-text searchable? Speed andconsistency: How fast is it? How consistent is it? Do you get
different results at different times?
8/3/2019 03 Search Engine
15/15