Search Engine Presentation Final

8/2/2019 Search Engine Presentation Final

http://slidepdf.com/reader/full/search-engine-presentation-final 1/25

Presented by:

Rasik Mevada

Vishal Dabhi

Vimal Nair

Ravi Mathai

Search engines …….



INTRODUCTION

HISTORY

TYPES OF SEARCH ENGINE

HOW SEARCH ENGINES WORKS?

CONCLUSION



Introduction Search engine is a software program that

searches for sites based on the words that youdesignate as search terms.

Search engines look through their own databasesof information in order to find what it is that you

are looking for.

Some search engines also mine data available indatabases or open directories. Unlike web

directories, which are maintained only by humaneditors, search engines also maintain real-timeinformation by running an algorithm on a webcrawler.



Introduction Contd… A web search engine is designed to search for

information on the World Wide Web.

The search results are generally presented in alist of results often referred to as search engineresults pages (SERPs).

Information consist of web pages, images,information and other types of files.

“Search engine” is the popular term for anInformation Retrieval (IR) system.



History In 1990- first tool for searching on the internet

was “ARCHIVE”.

The program downloaded the directory listings ofall the files located on public anonymous FTP(File Transfer Protocol) sites, creating a

searchable database of file names;

However, Archie did not index the contents ofthese sites since the amount of data was so

limited it could be readily searched manually.



In 1991 - The rise of Gopher led to two newsearch programs, Veronica and Jughead.

Like Archie, they searched the file names andtitles stored in Gopher index systems. Veronica(V ery E asy R odent-O riented N et-wide I ndex to

C omputerized Archives) provided a keywordsearch of most Gopher menu titles in the entireGopher listings.

Jughead (J onzy's U niversal G opher H ierarchyE xcavation And D isplay) was a tool for obtainingmenu information from specific Gopher servers.



In June 1993, Matthew Gray, then at MIT,produced what was probably the first web robot,the Perl-based World Wide Web Wanderer, andused it to generate an index called 'Wandex'.

The web's second search engine Aliweb

appeared in November 1993. Aliweb did not usea web robot, but instead depended on beingnotified by website administrators of the existence

at each site of an index file in a particular format.

One of the first "full text" crawler-based searchengines was WebCrawler, which came out in1994.











SEARCHEngine Types

Crawler-

Based SearchEngines

Human-

PoweredDirectories

Hybrid Search

Engines" OrMixed Results



A Web crawler is a computer program that browsesthe World Wide Web in a methodical, automatedmanner or in an orderly fashion. Other terms for Webcrawlers are ants , automatic indexers , bots , Web spiders , Web robots , or — especially in the FOAF

community —

Web scutters .

This process is called Web crawling or spidering .Many sites, in particular search engines, use

spidering as a means of providing up-to-date data.Web crawlers are mainly used to create a copy of allthe visited pages for later processing by a searchengine that will index the downloaded pages toprovide fast searches.



Crawler-based search engines have three major components.1) The crawler

Also called the spider, it visits a web page, reads it, and then follows links toother pages within the site. The spider will return to the site on a regular

basis, such as every month or every fifteen days, to look for changes.

2) The indexEverything the spider finds goes into the second part of the search engine,

the index. The index will contain a copy of every web page that the spider

finds. If a web page changes, then the index is updated with new

information.

3) The search engine software This is the software program that accepts the user-entered query, interprets

it, and shifts through the millions of pages recorded in the index to find

matches and ranks them in order of what it believes is most relevant and

presents them in a customizable manner to the user.



Crawler Based Search Engines

contd..

A Web crawler is one type of bot, or software agent. Ingeneral, it starts with a list of URLs to visit, called the

seeds . As the crawler visits these URLs, it identifies all

the hyperlinks in the page and adds them to the list of

URLs to visit, called the crawl frontier . URLs from thefrontier are recursively visited according to a set of

policies.

The large volume implies that the crawler can only download limited number of the Web pages within a given

time, so it needs to prioritize its downloads. The high rate

of change implies that the pages might have already been

updated or even deleted.



Automating maintenance tasks on a Web site, such aschecking links or validating HTML code.

Crawlers can be used to gather specific types of

information from Web pages, such as harvesting e-mailaddresses (usually for sending spam).



Architecture of Web Crawler



Behaviour of a Web Crawler

The behaviour of a Web crawler is the outcome of acombination of policies:

a selection policy that states which pages todownload,

a re-visit policy that states when to check forchanges to the pages,

a politeness policy that states how to avoid

overloading Web sites, and a parallelization policy that states how to

coordinate distributed Web crawlers.





2. Human Powered Directories A human-edited directory is created and maintained

by editors who add links based on the policiesparticular to that directory.

Some directories may prevent search engines fromrating a displayed link by using redirects, no followattributes, or other techniques.

Many human-edited directories, including the OpenDirectory Project, Salehoo and World Wide WebVirtual Library, are edited by volunteers, who are

often experts in particular categories. These directories are sometimes criticized due to long

delays in approving submissions, or for rigidorganizational structures and disputes amongvolunteer editors.



In response to these criticisms, some volunteer-editeddirectories have adopted wiki technology, to allowbroader community participation in editing the directory(at the risk of introducing lower-quality, less objectiveentries).

Another direction taken by some web directories is thepaid for inclusion model.

This method enables the directory to offer timelyinclusion for submissions and generally fewer listings asa result of the paid model.

These options typically have an additional feeassociated, but offer significant help and visibility to



Today submission of websites to web directoriesis considered a common SEO (search engineoptimization) technique to get back-links for thesubmitted web site.

One distinctive feature of 'directory submission' isthat it cannot be fully automated like searchengine submissions.

Manual directory submission is a tedious and timeconsuming job and is often outsourced by the

webmasters.





3.

In the web's early days, it used to be that asearch engine either presented crawler-basedresults or human-powered listings. Today, it

extremely common for both types of results to bepresented. Usually, a hybrid search engine willfavour one type of listings over another.

For example, MSN Search is more likely topresent human-powered listings from Look Smart.However, it does also present crawler-basedresults (as provided by Inktomi), especially for

more obscure queries.



Date post:	05-Apr-2018
Category:	Documents
Upload:	ronak-chauhan
View:	220 times
Download:	0 times

Search Engine Presentation Final

Documents