Date post: | 12-Jan-2016 |
Category: |
Documents |
Upload: | dominic-harrington |
View: | 216 times |
Download: | 0 times |
ITIS 1210ITIS 1210Introduction to Web-Based Introduction to Web-Based
Information SystemsInformation Systems
Chapter 27Chapter 27How Internet Searching WorksHow Internet Searching Works
IntroductionIntroduction
Major advantage of InternetMajor advantage of Internet HUGE amounts of information availableHUGE amounts of information available
Major DISadvantage:Major DISadvantage: It’s DISORGANIZEDIt’s DISORGANIZED There is no central ownership so there’sThere is no central ownership so there’s
No common filing systemNo common filing system No “Dewey Decimal System”No “Dewey Decimal System”
No common glossaryNo common glossary
IntroductionIntroduction
I think the book confuses these two terms I think the book confuses these two terms but …but …
Two popular solutionsTwo popular solutions IndexesIndexes Search EnginesSearch Engines
IndexesIndexes Provide a lookup mechanismProvide a lookup mechanism Associate lookup terms with informationAssociate lookup terms with information
Example: telephone bookExample: telephone book
IntroductionIntroduction
On the Web, indexes tend to be On the Web, indexes tend to be hierarchicalhierarchical Click on a clickClick on a click Takes you to a list of sub-categoriesTakes you to a list of sub-categories May be several levels deepMay be several levels deep At some point documents associated with that At some point documents associated with that
category become availablecategory become available Yahoo! For exampleYahoo! For example
Binary search!Binary search!
IntroductionIntroduction
Yahoo!Yahoo!
IntroductionIntroduction
Search engines are massive databasesSearch engines are massive databases Don’t present information in a hierarchical Don’t present information in a hierarchical
formform InsteadInstead
You provide search termYou provide search term Database returns relevant resultsDatabase returns relevant results
Confusion results from fact that database Confusion results from fact that database are organized around indexesare organized around indexes
IntroductionIntroduction
Google tends to be the default search Google tends to be the default search engineengine Big mistake, other ones out there, some Big mistake, other ones out there, some
betterbetter
Have three basic characteristicsHave three basic characteristics Spider – gathers information from WebSpider – gathers information from Web Database – Stores that informationDatabase – Stores that information Search tool – Lets users search the databaseSearch tool – Lets users search the database
IntroductionIntroduction
Spiders work constantlySpiders work constantly Retrieving information from Web sitesRetrieving information from Web sites Presenting it to search engine for analysisPresenting it to search engine for analysis
Engines characterize the dataEngines characterize the data Remember, HTML is basically just text filesRemember, HTML is basically just text files By keyword in documentBy keyword in document
Some use all, some only a fewSome use all, some only a few By entire content of documentBy entire content of document
By size, title, heading, metadata, etc.By size, title, heading, metadata, etc.
IntroductionIntroduction
Results from this analysis stored in Results from this analysis stored in databasedatabase Typically keyword & URLTypically keyword & URL Other information might be stored as wellOther information might be stored as well
First few sentences on Web pageFirst few sentences on Web page Last date updatedLast date updated
Search tool provides access to database Search tool provides access to database for usersfor users
IntroductionIntroduction
User provides keyword (or other search User provides keyword (or other search term)term)
Tool returns list of data in database that Tool returns list of data in database that fits that termfits that term
Different search engines return results in Different search engines return results in different waysdifferent ways Weigh resultsWeigh results Sponsored linksSponsored links
IntroductionIntroduction
No search engine indexes the entire No search engine indexes the entire InternetInternet
All have strengths and weaknessesAll have strengths and weaknesses Limiting your self to a single search engine Limiting your self to a single search engine
is is Intellectually lazyIntellectually lazy Academically short-sightedAcademically short-sighted Ultimately self-defeatingUltimately self-defeating
Would you only shop at Wal-Mart?Would you only shop at Wal-Mart?
How Internet Search Engines WorkHow Internet Search Engines Work
Spiders are just programsSpiders are just programs They follow a link to a Web pageThey follow a link to a Web page
Really, the Web page Really, the Web page comes to themcomes to them
Different spiders follow different rulesDifferent spiders follow different rules Written by different programmers for different Written by different programmers for different
companiescompanies Follow every linkFollow every link Ignore links to graphics files or newsgroupsIgnore links to graphics files or newsgroups
How Internet Search Engines WorkHow Internet Search Engines Work
Content analyzed and stored in databaseContent analyzed and stored in database Indexes are built and updatedIndexes are built and updated
Users visit search engine siteUsers visit search engine site Provide search termProvide search term
Describes the information you wantDescribes the information you want Might ask to match keyword(s)Might ask to match keyword(s) Might be to limit byMight be to limit by
DateDate Type of fileType of file SizeSize
How Internet Search Engines WorkHow Internet Search Engines Work
Search engine softwareSearch engine software Searches database for information matching Searches database for information matching
keywordskeywords Results formatted as a Web page in HTMLResults formatted as a Web page in HTML Sent back to userSent back to user
How Meta-Search Software WorksHow Meta-Search Software Works
ProblemProblem Web is too large for any one search engine to Web is too large for any one search engine to
consistently provide superior resultsconsistently provide superior results Best approach is a combination of results Best approach is a combination of results
from multiple search enginesfrom multiple search engines
Meta-search software lets you search Meta-search software lets you search multiple search engines for resultsmultiple search engines for results
Better than searching each one Better than searching each one individuallyindividually
How Meta-Search Software WorksHow Meta-Search Software Works
Some software uses multiple “agents”Some software uses multiple “agents” Specialized software that does one thing very Specialized software that does one thing very
well – like searchingwell – like searching
Each agent contacts a single search Each agent contacts a single search engineengine
The agents know how each engine The agents know how each engine functionsfunctions How to ask the right questionHow to ask the right question
How Meta-Search Software WorksHow Meta-Search Software Works
Individual search engines report their Individual search engines report their results to the agentresults to the agent
Agents report results back to meta-search Agents report results back to meta-search engineengine
This process may be repeated multiple This process may be repeated multiple times by each agenttimes by each agent
How Meta-Search Software WorksHow Meta-Search Software Works
Meta-search engine deletes duplicate Meta-search engine deletes duplicate resultsresults May rank based on how many search engines May rank based on how many search engines
report a specific sitereport a specific site One indication of quality – lots of referencesOne indication of quality – lots of references
You see a result you like, click on the link, You see a result you like, click on the link, and are taken to that Web pageand are taken to that Web page
How Meta-Search Software WorksHow Meta-Search Software Works
No two meta-search engines are alikeNo two meta-search engines are alike Some only refer to popular search enginesSome only refer to popular search engines Some include lesser-known search Some include lesser-known search
engines, newsgroups, or databaseengines, newsgroups, or database Also differ on how results are presentedAlso differ on how results are presented
Some indicate which search engine returned Some indicate which search engine returned results, some don’tresults, some don’t
LinksLinks
http://www.dmoz.org/Computers/Internet/http://www.dmoz.org/Computers/Internet/Searching/Metasearch//Searching/Metasearch//
LinksLinks
http://www.lib.berkeley.edu/TeachingLib/http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/MetaSearch.htmlGuides/Internet/MetaSearch.html
http://news.zdnet.com/2100-9588_22-http://news.zdnet.com/2100-9588_22-5647280.html5647280.html
http://www.dmoz.org/Computers/Internet/http://www.dmoz.org/Computers/Internet/Searching/Metasearch//Searching/Metasearch//