Date post: | 05-Apr-2018 |
Category: |
Documents |
Upload: | ronak-chauhan |
View: | 220 times |
Download: | 0 times |
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 1/25
Presented by:
Rasik Mevada
Vishal Dabhi
Vimal Nair
Ravi Mathai
Search engines …….
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 2/25
INTRODUCTION
HISTORY
TYPES OF SEARCH ENGINE
HOW SEARCH ENGINES WORKS?
CONCLUSION
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 3/25
Introduction Search engine is a software program that
searches for sites based on the words that youdesignate as search terms.
Search engines look through their own databasesof information in order to find what it is that you
are looking for.
Some search engines also mine data available indatabases or open directories. Unlike web
directories, which are maintained only by humaneditors, search engines also maintain real-timeinformation by running an algorithm on a webcrawler.
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 4/25
Introduction Contd… A web search engine is designed to search for
information on the World Wide Web.
The search results are generally presented in alist of results often referred to as search engineresults pages (SERPs).
Information consist of web pages, images,information and other types of files.
“Search engine” is the popular term for anInformation Retrieval (IR) system.
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 5/25
History In 1990- first tool for searching on the internet
was “ARCHIVE”.
The program downloaded the directory listings ofall the files located on public anonymous FTP(File Transfer Protocol) sites, creating a
searchable database of file names;
However, Archie did not index the contents ofthese sites since the amount of data was so
limited it could be readily searched manually.
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 6/25
In 1991 - The rise of Gopher led to two newsearch programs, Veronica and Jughead.
Like Archie, they searched the file names andtitles stored in Gopher index systems. Veronica(V ery E asy R odent-O riented N et-wide I ndex to
C omputerized Archives) provided a keywordsearch of most Gopher menu titles in the entireGopher listings.
Jughead (J onzy's U niversal G opher H ierarchyE xcavation And D isplay) was a tool for obtainingmenu information from specific Gopher servers.
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 7/25
In June 1993, Matthew Gray, then at MIT,produced what was probably the first web robot,the Perl-based World Wide Web Wanderer, andused it to generate an index called 'Wandex'.
The web's second search engine Aliweb
appeared in November 1993. Aliweb did not usea web robot, but instead depended on beingnotified by website administrators of the existence
at each site of an index file in a particular format.
One of the first "full text" crawler-based searchengines was WebCrawler, which came out in1994.
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 8/25
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 9/25
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 10/25
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 11/25
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 12/25
SEARCHEngine Types
Crawler-
Based SearchEngines
Human-
PoweredDirectories
Hybrid Search
Engines" OrMixed Results
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 13/25
A Web crawler is a computer program that browsesthe World Wide Web in a methodical, automatedmanner or in an orderly fashion. Other terms for Webcrawlers are ants , automatic indexers , bots , Web spiders , Web robots , or — especially in the FOAF
community —
Web scutters .
This process is called Web crawling or spidering .Many sites, in particular search engines, use
spidering as a means of providing up-to-date data.Web crawlers are mainly used to create a copy of allthe visited pages for later processing by a searchengine that will index the downloaded pages toprovide fast searches.
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 14/25
Crawler-based search engines have three major components.1) The crawler
Also called the spider, it visits a web page, reads it, and then follows links toother pages within the site. The spider will return to the site on a regular
basis, such as every month or every fifteen days, to look for changes.
2) The indexEverything the spider finds goes into the second part of the search engine,
the index. The index will contain a copy of every web page that the spider
finds. If a web page changes, then the index is updated with new
information.
3) The search engine software This is the software program that accepts the user-entered query, interprets
it, and shifts through the millions of pages recorded in the index to find
matches and ranks them in order of what it believes is most relevant and
presents them in a customizable manner to the user.
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 15/25
Crawler Based Search Engines
contd..
A Web crawler is one type of bot, or software agent. Ingeneral, it starts with a list of URLs to visit, called the
seeds . As the crawler visits these URLs, it identifies all
the hyperlinks in the page and adds them to the list of
URLs to visit, called the crawl frontier . URLs from thefrontier are recursively visited according to a set of
policies.
The large volume implies that the crawler can only download limited number of the Web pages within a given
time, so it needs to prioritize its downloads. The high rate
of change implies that the pages might have already been
updated or even deleted.
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 16/25
Automating maintenance tasks on a Web site, such aschecking links or validating HTML code.
Crawlers can be used to gather specific types of
information from Web pages, such as harvesting e-mailaddresses (usually for sending spam).
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 17/25
Architecture of Web Crawler
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 18/25
Behaviour of a Web Crawler
The behaviour of a Web crawler is the outcome of acombination of policies:
a selection policy that states which pages todownload,
a re-visit policy that states when to check forchanges to the pages,
a politeness policy that states how to avoid
overloading Web sites, and a parallelization policy that states how to
coordinate distributed Web crawlers.
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 19/25
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 20/25
2. Human Powered Directories A human-edited directory is created and maintained
by editors who add links based on the policiesparticular to that directory.
Some directories may prevent search engines fromrating a displayed link by using redirects, no followattributes, or other techniques.
Many human-edited directories, including the OpenDirectory Project, Salehoo and World Wide WebVirtual Library, are edited by volunteers, who are
often experts in particular categories. These directories are sometimes criticized due to long
delays in approving submissions, or for rigidorganizational structures and disputes amongvolunteer editors.
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 21/25
In response to these criticisms, some volunteer-editeddirectories have adopted wiki technology, to allowbroader community participation in editing the directory(at the risk of introducing lower-quality, less objectiveentries).
Another direction taken by some web directories is thepaid for inclusion model.
This method enables the directory to offer timelyinclusion for submissions and generally fewer listings asa result of the paid model.
These options typically have an additional feeassociated, but offer significant help and visibility to
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 22/25
Today submission of websites to web directoriesis considered a common SEO (search engineoptimization) technique to get back-links for thesubmitted web site.
One distinctive feature of 'directory submission' isthat it cannot be fully automated like searchengine submissions.
Manual directory submission is a tedious and timeconsuming job and is often outsourced by the
webmasters.
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 23/25
8/2/2019 Search Engine Presentation Final
http://slidepdf.com/reader/full/search-engine-presentation-final 24/25
3.
In the web's early days, it used to be that asearch engine either presented crawler-basedresults or human-powered listings. Today, it
extremely common for both types of results to bepresented. Usually, a hybrid search engine willfavour one type of listings over another.
For example, MSN Search is more likely topresent human-powered listings from Look Smart.However, it does also present crawler-basedresults (as provided by Inktomi), especially for
more obscure queries.