Web mining (1)

Post on 12-Apr-2017

20 views 0 download

transcript

WEB MINING

contents The Web Web mining Data mining vs web mining Why mine the web Web mining taxonomy Applications of web mining Conclusion

The Web Web is a collection of inter-related files on one or

more Web servers.

Wealth of information : Presence everywhere.

Structure : Graph structure with links between

pages.

Access : Hundreds of millions of requests per day.

Web Mining is the use of the data mining techniques to automatically discover and extract information from web documents

Discovering useful information from the World-Wide Web and its usage patterns

Web Mining

Data Mining vs Web Mining Traditional data mining

Data is structured and relational.

Well-defined tables, columns, rows, keys, and

constraints.

Web data

Semi-structured and unstructured.

Rich in features and patterns.

Enormous wealth of information on Web

Financial information Book/CD/Video stores Restaurant information Car prices

Lots of data on user access patterns Web logs contain sequence of URLs accessed by

users

Why Mine the Web?

The Web is a huge collection of documents except for

Hyper-link information Access and usage information

The Web is very dynamic

New pages are constantly being generated

Why is Web Mining Different?

Web Mining TaxonomyWeb Mining

Content Mining

Text

Image

Video

Audio

Structure Record

Structure Mining

Hyperlink

Inter Document Hyperlink

Intra Document Hyperlink

Document Structure

Usage Mining

Web Server Log

Application Sever Log

Application Level Log

Web Content Mining

 This is the process of mining useful information from the contents of Web pages and Web documents,

which are mostly text, images and audio/video files.

Web structure mining Web structure mining is the process of discovering structure information from the web

This type of mining can be performed either the documents level or at the hyperlink level

 web structure mining can be divided into two kinds:

1. Hyperlink : A hyperlink is a structural unit that connects a location in a web page to a different location, either within the same web page or on a different web page

2. document structure : The content within a Web page can also be organized in a tree structured format, based on the various HTML and XML tags within the page

Web usage miningWeb Usage Mining is the application of 

data mining techniques to discover interesting usage patterns from Web data

Usage data captures the identity or origin of Web users along with their browsing behavior at a Web site. 

Web usage mining itself can be classified further depending on the kind of usage data considered:

Web Server Data: The user logs are collected by the Web server. Typical data includes IP address

Application Server Data: Commercial application servers have significant features to enable e-commerce applications to be built on top of them with little effort. Application Level Data: New kinds of events can be defined in an application, and logging can be turned on for them thus generating histories of these specially defined events

Applications of web mining Information retrieval on the Web�

Network Management

E-commerce

conclusionAs the web and its usage continues to grow.

The past five years have seen the emergence of web mining as a rapidly growing area, due to the efforts of the research community as well as various organizations that are practicing it

Thank you