+ All Categories
Home > Education > New understanding website

New understanding website

Date post: 05-Dec-2014
Category:
Upload: umrella
View: 76 times
Download: 1 times
Share this document with a friend
Description:
 
22
Understanding Websites
Transcript
Page 1: New understanding website

Understanding Websites

Page 2: New understanding website

Trainings by Vidya Bhagwat

• Websites : A website is hosted on at least one web server,

accessible via a network such as the Internet or a private local area network through an Internet address known as a Uniform resource locator. All publicly accessible websites collectively constitute the.

Page 3: New understanding website

Importance Of Websites:

• Internet marketing comes of age

• Internet marketing is now a major, multi-billion dollar industry.

• Despite some concerns, many consumers now have the skills and the confidence to transact purchases using the web.

Page 4: New understanding website

Trainings by Vidya Bhagwat

• An Internet "presence" has now become essential .

• A modern, well presented website is now expected for most businesses and organizations.

• A website should explain the products and services offered. It should also provide background and general contact information.

Page 5: New understanding website

Trainings by Vidya Bhagwat

• Local business is affected as well

• Many small business operators have been disappointed with the results

achieved by their websites.

• Sites have been created but few if any business has resulted.

• There are a number of reasons:

• unrealistic expectations;

• poor website construction (not search engine friendly);

• poor targeting.

• Local search is growing in importance. Local search is the ability to

search for and find businesses and organizations in the local area, that

is, in close proximity geographically.

• This will vary from business to business.

Page 6: New understanding website

Trainings by Vidya Bhagwat

Website Structure Understanding :

• Website Structure Understanding and its Applications.

• Website structure understanding can be treated as a reverse engineering for the purpose of automatically discovering the layout templates and URL patterns of a website, and understanding how these templates and patterns are integrated to organize the website. The study of this problem has had a great impact to many applications which can leverage such site-level knowledge to help web search and data mining.

Page 7: New understanding website

Trainings by Vidya Bhagwat

Page 8: New understanding website

Trainings by Vidya Bhagwat

• What’s Website Structure?In this project, the website structure consists of

three components: layout templates, URL patterns, and linkage structure.

• Layout Template:Most web pages consist of HTML elements like

table, menu, button, image, and input box. The layout of a web page describes what HTML elements are included in the page, as well as how these elements are visually distributed in page rendering. Essentially, a page layout is represented by a so called DOM (Document Object Model) tree. In this project, a layout template is considered as a group of pages which have very similar layouts (DOM trees).

Page 9: New understanding website

Trainings by Vidya Bhagwat

• In a website, pages are generated based on distinguishable templates according to their functions. That is to say, visually similar pages usually have same function. In this way, user can easily identify a page’s function at a glance. Following are several typical layout templates identified from the ASP.NET Forums. Their functions are to show a) a list of discussion thread, b) a list of thread posts, and c) user profile, respectively.

Page 10: New understanding website

Trainings by Vidya Bhagwat

• It is noticed that one layout templates can have more than one related URL pattern. For example, a bookseller website usually designs one template to show a list of books, and provides different query parameters to generate such a list. Various query parameters in this scenario will lead to different URL patterns, but the search results are shown with the same template. Another common case is duplicate pages, i.e., pages with the same content (and very likely the same layout) but different URLs.

Page 11: New understanding website

Trainings by Vidya Bhagwat

• Link Structure :Based on the layout templates and URL

patterns, we can construct a directed graph to represent the website organization structure. That is, each layout template is considered as a node in a graph, and two nodes are linked if there are hyperlinks between the pages belonging to the two nodes. The link direction is the same as the related hyperlinks. And each link is characterized with the URL pattern of the corresponding hyperlink URLs. Again, it should be noticed that there could be multiple links from one node to another if the corresponding hyperlinks have more than one URL pattern.

• Fig. 2 gives an illustrative example of the sub-graph constructed based on the layout templates and URL patterns above.

Page 12: New understanding website

Trainings by Vidya Bhagwat

• Random Sampling :The goal of random sampling is to provide a

snapshot of a website by downloading only a relatively small number of pages. The sampling quality is the foundation of the whole mining process. To keep the downloaded pages as diverse as possible, in practice the sampling process adopts a strategy combining both breadth-first and depth-first, and can quickly retrieve pages at deep levels within a few steps.

Page 13: New understanding website

Trainings by Vidya Bhagwat

• Inspired by this observation, in this project, DOM path is utilized to characterize the layout of a webpage. As shown in Fig. 5, a DOM path is a path from a leaf node to the root of the DOM tree. The leaf node indicates the component type, and the path-to-root approximately describes the visual location of that component in page rendering.

• Given a set of HTML pages, all unique DOM paths are extracted to form a feature space. Each page is represented

by a point in the feature space, and the layout similarity of two pages can be estimated. A bottom-up strategy is then utilized to group similar pages, and each cluster is considered as a layout template.

Page 14: New understanding website

Trainings by Vidya Bhagwat

• URL Pattern Discovery :A URL is not an ordinary string but has a syntax

structure scheme strictly defined by W3C standards. Based on a syntax structure, a URL string can be represented by a group of key-value pairs. Fig. 6 gives an example URL, its syntax structure, and the corresponding key-value pairs.

It is noticed that different URL components (or keys) usually have different functions and play different roles in a website. In general, keys denoting directories, functions, and document types are with only a few values, which should be explicitly recorded in a URL pattern. By contrast, keys denoting parameters such as user names are with quite diverse values, which should be generalized in the pattern.

Page 15: New understanding website

Trainings by Vidya Bhagwat

• It is noticed that different URL components (or keys) usually have different functions and play different roles in a website. In general, keys denoting directories, functions, and document types are with only a few values, which should be explicitly recorded in a URL pattern. By contrast, keys denoting parameters such as user names are with quite diverse values, which should be generalized in the pattern. Based on this observation, a top-down recursive split process is proposed in this project to construct a pattern tree to characterize a set of URLs. Fig. 7 gives an example pattern tree based on URLs from www.wretch.cc. Algorithm details please refer to.

Page 16: New understanding website

Trainings by Vidya Bhagwat

• Website Designing India have assisted hundreds of businesses to build or update a website custom to their requirements.You get more than just a website with our Website Designing Services. You can update your website content easily, take credit card payments online, and use lots of tools like poll managers, news managers, photo galleries, and form builders.Whether you're looking for an ecommerce web design company or a web development company that showcases your business, our website designing & development services give you control over your site with no technical skills needed.

Page 17: New understanding website

Trainings by Vidya Bhagwat

Domain Name :

• This article is about domain names in the Internet. For other uses, see Domain.

• A domain name is a unique name that identifies a website. It is an identification string that defines a realm of administrative autonomy, authority or control on the Internet. Domain names are formed by the rules Domain Name System (DNS). Any name registered in the DNS is a domain name. The functional description of domain names is presented in the Domain Name System article. Broader usage and industry aspects are captured here.

Page 18: New understanding website

Trainings by Vidya Bhagwat

• Domain names are used in various networking contexts and application-specific naming and addressing purposes. In general, a domain name represents an Internet Protocol (IP) resource, such as a personal computer used to access the Internet, a server computer hosting a web site, or the web site itself or any other service communicated via the Internet. In 2010, the number of active domains reached 196 million.

Page 19: New understanding website

Trainings by Vidya Bhagwat

Use In Web Site Hosting

• The domain name is a component of a Uniform Resource

Locator (URL) used to access web sites, for example:

• URL: http://www.example.net/index.html

• Top-level domain name: net

• Second-level domain name: example.net

Page 20: New understanding website

Trainings by Vidya Bhagwat

• Host name: www.example.net

• A domain name may point to multiple IP addresses in order to provide server redundancy for the cybernetic services to be delivered; such multi-address capability is used to manage the traffic of large, popular web sites. More commonly, however, one server computer, at a given IP address, may also host web sites in different domains. Such address overloading enables virtual web hosting, commonly used by large web hosting services to conserve IP address space. IP-address overloading is possible through a feature in the HTTP version 1.1 protocol, but not in the HTTP version 1.0 protocol, which requires that a request identify the domain name being referred for connection.

Page 21: New understanding website

Trainings by Vidya Bhagwat

Contact Information

• To obtain further information about any of our databases, services, or programs, contact NCBI:Pub Med Customer Service:

• Send an Email for help with technical issues, searching, or content assistance

• Call 1-888-FIND-NLM (1-888-346-3656) for help with searching or content assistance only

• General Information: [email protected]• Questions about and technical support for NCBI and its

programs and services • BLAST: [email protected]• Technical questions on running or interpreting BLAST

sequence comparison searches

Page 22: New understanding website

Trainings by Vidya Bhagwat

Thank You


Recommended