+ All Categories
Home > Documents > The anatomy of a Large-Scale Hypertextual Web Search Engine

The anatomy of a Large-Scale Hypertextual Web Search Engine

Date post: 22-Feb-2016
Category:
Upload: tybalt
View: 39 times
Download: 0 times
Share this document with a friend
Description:
The anatomy of a Large-Scale Hypertextual Web Search Engine. What we want from a search engine. Speed Quantity of Results Efficient Storage Space Quality of Results. Google attempts to bring us all of these aspects from search. Precision of result:. Second Generation Search Engine. - PowerPoint PPT Presentation
Popular Tags:
9
The anatomy of a Large-Scale Hypertextual Web Search Engine
Transcript
Page 1: The anatomy of a Large-Scale Hypertextual Web Search Engine

The anatomy of a Large-Scale Hypertextual Web Search Engine

Page 2: The anatomy of a Large-Scale Hypertextual Web Search Engine

What we want from a search engine.

•Speed•Quantity of Results•Efficient Storage Space•Quality of Results

Google attempts to bring us all of these aspects from search.

Page 3: The anatomy of a Large-Scale Hypertextual Web Search Engine

Precision of result:

Page RankAnchor Text

Second Generation Search Engine

Page 4: The anatomy of a Large-Scale Hypertextual Web Search Engine

Page Rank

PR(A) = (1-d) + d(PR(T1)/C(T1) + … + PR(Tn)/C(Tn))

The more number of links that is pointing to a page (from other pages), the higher the page rank will be.The probability that a random internet surfer will reach this page by randomly clicking links.

Also determined by the number of links the page has pointing you have.The more links page A has, the more valued the link from page A to B will be.

Page 5: The anatomy of a Large-Scale Hypertextual Web Search Engine

Anchor Text

Each and every link on the internet will have some “invisible” text alongside it. This text is given by the page creator explaining what this link does, where it leads, or what it attempts to explain. By taking all of these links from hundreds of different sites, Google uses these anchor text to be able to provide most relevant search results.

Page 6: The anatomy of a Large-Scale Hypertextual Web Search Engine

Proximity Search and Others

Google keeps track of how close the related words are too each other and also keeps track of the visual presentation (font size, color, boldness ect).

Page 7: The anatomy of a Large-Scale Hypertextual Web Search Engine
Page 8: The anatomy of a Large-Scale Hypertextual Web Search Engine
Page 9: The anatomy of a Large-Scale Hypertextual Web Search Engine

Crawling and Indexing

•Google typically ran about 3.•Each crawler opens roughly 300 connections as once.•At peak performance, with 4 crawlers, Google can crawl 100 web pages per second.•Roughly 600K per second of data.

•Parsing•Indexing documents into barrels•Sorting


Recommended