of 17
8/14/2019 Search Engine Seminar
1/17
SIDHARTHA SARANGIRoll: 064021Regno:0601292094
8/14/2019 Search Engine Seminar
2/17
WEB SEARCH ENGINE A web search engine is a tool designed to search for information on the World Wide Web.
The search results may consist of web pages, images, information and other types of files.Search engine work algorithmically or are a mixture of algorithmic and human input.
According to netcraft
there are around 240000000web domains globally.
8/14/2019 Search Engine Seminar
3/17
Current search engines:
8/14/2019 Search Engine Seminar
4/17
engine
First generation search engine:First generation search engine:
Search results were depended on what was on the Web page. factors included keyword density, title, and where in the document keywards appeared.
First generation added relevancy for META tags, keywords in the domain name, and a few bonus points for having keywords in the URL.
Second generation search engine:Second generation search engine:
Employ tracking clicks, link popularity and link quality. Then they added context where two-word keyword pairs were extracted from a page to bettercategorize it.
Google's Page Rank system and the length of visits are the evidence of 2nd generation search engine.
Third generation search engine:Third generation search engine:
It adds word stemming to keep a search in context. Auto extraction of keyword pairs helps categorize a page. It extracts data about your individual
searching habits. It adds Web maps which are a useful filtering tool to get rid of duplicate sites.
8/14/2019 Search Engine Seminar
5/17
works: A search engine operates, in the following
order Web crawling Indexing
Searching
8/14/2019 Search Engine Seminar
6/17
Web crawling Web search engines work by storing information about many web
pages, from the WWW by a Web crawler (spider) an automated
Web browser which follows every link it sees.
Googlebot is Googles web crawling robot.It functions like web browser, by sending a
request to a web server for a web page,downloading the entire page,
then handing it off toGoogles indexer.
Search engine spiders do not read pagesthe way a human does. Instead, they tend
to see only particular stuffand are blind
for many extras (Flash, JavaScript ,images)that are intended for humans.
8/14/2019 Search Engine Seminar
7/17
Spider simulator:Bput.org
As we can see the
images,flash,javascript
/vbscript does not
have any Impact on
the webspider.
The only thing matters
is text, in-bound / out-
bound links, meta key-
words etc.
8/14/2019 Search Engine Seminar
8/17
Indexing Web crawler gives the indexer the full text of the pages it finds. These pages
are stored in Googles index database by search term, with each index entrystoring a list of documents in which the term appears and the location withinthe text where it occurs. This data structure allows rapid access to documentsthat contain user query terms.
To improve search performance,Google ignores stop words (such
as the, is, on, or, of, how, why,
as well as certain single digitsand single letters).
The indexer ignores some punctuation
and multiple spaces, as well asconverting all letters to lowercase,
to improve Googles performance.
8/14/2019 Search Engine Seminar
9/17
Searching:The query processor has
several parts, including the userinterface (search box), the
engine that evaluates queries
and matches
them to relevant documents,
and the results formatter.
8/14/2019 Search Engine Seminar
10/17
engine: 3D search engine
Theme search engine
Meta search engine
its time to look beyondgoogle
8/14/2019 Search Engine Seminar
11/17
3D search engine: A Search engines that can mine catalogs of three-
dimensional objects , which lets users create images asqueries for searches.
Query formulationUsers can select objects from a catalog of images based onproduct groupings, or they can let users draw a 2D or 3Drepresentation of the object they want to find.
Search processIt uses algorithms to convert the selected or drawn image-based query into a mathematical model. The search systemthen compares the mathematical description of the drawnor selected object to those of 3D objects stored in adatabase, looking for similarities in the described features.
Ex : Princeton 3D Model Search Enginehttp://shape.cs.princeton.edu/search.html
8/14/2019 Search Engine Seminar
12/17
8/14/2019 Search Engine Seminar
13/17
8/14/2019 Search Engine Seminar
14/17
Theme search engine: It is called as `in context' searching or on topic
searching.
What you say your page is about, what the searchengine calculates your page to be about, and what the
rest of the Internet thinks your page is about, mustmatch, according to their mathematical formulas.
The 2nd & 3rd Generation search engines are example oftheme search engine.
8/14/2019 Search Engine Seminar
15/17
Meta search engine: A meta-search engine is a search tool that sends user requests to
several other search engines and/or databases and aggregates theresults into a single list or displays them according to their source.
Web is too large for any one search engine to index it all and that morecomprehensive search results can be
obtained by combining the results fromseveral search engines. This also
may save the user from having touse multiple search engines
separately. This alsohelps in deep web searching.
Metasearch engines create what isknown as a virtual database.
They take a user's request,pass it toseveral other heterogeneous search
engines and then compile the results.
8/14/2019 Search Engine Seminar
16/17
optimization Search engine optimization (SEO) is the
process of improving the volume or quality oftraffic to a web site from search engines.
Current Optimization Strategies
1. Cloaking: Hide it from the spiders eye. 2. Keyword Weight: Use proper key word.
4. Stop Words: Be careful with stop words.
5. Redundancy: Dont use same pages again.
6. Lengthy Pages: focus on one topic
8/14/2019 Search Engine Seminar
17/17