Web user session

8/7/2019 Web user session

1/24

WEB USER -SESSION TERFERENCE BY MEANS

INTRODUCTION

The explosive growth of the Web has drastically changed the way in which

information is managed and accessed. The large-scale of Web data1 sources and the wide

availability of services over the Internet have increased the need for effective Web data

management techniques and mechanisms. Understanding how users navigate over Web

sources is essential both for computing practitioners and researchers. In this context, Web

data clustering has been widely used for increasing Web information accessibility,

understanding users navigation behavior, improving information retrieval and content

delivery on the Web.

Computer user with an enormous flood of information. To almost any topic one

can think of, one can find pieces of information that are made available by other internet

citizens, ranging from individual users that post an inventory of their record collection, to

major companies that do business over the Web. To be able to cope with the abundance

of available information, users of the WWW need to rely on intelligent tools that assist

them in finding, sorting, and filtering the available information. Just as data mining aims

at discovering valuable information that is hidden in conventional databases, the

emerging field of Web mining aims at finding and extracting relevant information that is

hidden in Web-related data, in particular in text documents that are published on the

Web. Like data mining, Web mining is a multi-disciplinary effort that draws techniques

from fields like information retrieval, statistics, and machine learning, natural language

processing, and others.


2/24

The World Wide Web has become increasingly important as a medium for

commerce as well as for dissemination of information. In E-commerce, companies want

to analyze the users preferences to place advertisements, to decide their market strategy,

and to provide customized guide to Web customers. In todays information based society,there is an urge for Web surfers to find the needed information from the overwhelming

resources on the Internet. Web access log contains a lot of information that allows us to

observe users interest with the site. Properly exploited, this information can assist us to

make improvements to the Web site, create a more effective Web site organization and to

help users navigate through enormous Web documents. Therefore, data mining, which is

referred to as knowledge discovery in database (KDD), has been naturally introduced to

the World Wide Web.

Web usage data collected in access log is at a very fine granularity. It usually

includes every HTTP request from all users. Each request contains at least the IP address,

requested pages, time requested, response code, and size of the item requested. Therefore,

while the access log has the advantage of being extremely detailed, it also has some

drawbacks. When we apply statistical and probability methods to it, we tend to get results

that are too refined than it should be because the analysis might focus on micro trends

rather than macro trends. However, based on our observation, users browsing behavior

on the Web is highly uncertain. Users might browse the same page for different purposes,

spend various amounts of time on the same page or make different number of visits on it,

or even get to the page from different sources each time. Therefore, micro trends tend to

be erroneous and not of much use.

Depending on the nature of the data, one can distinguish three main areas of

research within the Web mining community:

Web Content Mining: application of data mining techniques to unstructured or semi

structured data, usually HTML-documents

Web Structure Mining: use of the hyperlink structure of the Web as an (additional)

information source


3/24

Web Usage Mining: analysis of user interactions with a Web server (e.g., click stream

analysis)

AN OVERVIEW OF WEB USAGE MINING

Web Usage Mining Process:

The main processes in Web Usage Mining are:

Preprocessing: Data preprocessing describes any type of processing performed on

raw data to prepare it for another processing procedure. Commonly used as a preliminary

data mining practice, data preprocessing transforms the data into a format that will be

more easily and effectively processed for the purpose of the user.

The different types of preprocessing in Web Usage Mining are:

1. Usage Pre-Processing: Pre- Processing relating to Usage patterns of users.

2. Content Pre-Processing: Pre- Processing of content accessed.

3. Structure Pre-Processing: Pre- Processing related to structure of the website.

Pattern Discovery: Web Usage mining can be used to uncover patterns in server

logs but is often carried out only on samples of data. The mining process will be

ineffective if the samples are not a good representation of the larger body of data.

The following are the pattern discovery methods.

1. Statistical Analysis

2. Association Rules

3. Clustering

4. Classification

5. Sequential Patterns

6. Dependency Modeling


4/24

Recently, data mining techniques have been applied to extract usage patterns from

Web log data. This process, known as Web usage mining, is traditionally performed in

several stages to achieve its goals:

1. Collection of Web data such as activities/click streams recorded in Web serverlogs,

2. Preprocessing of Web data such as filtering crawlers requests, requests to

graphics, and identifying unique sessions,

3. Analysis of Web data, also known as Web Usage Mining , to discover

interesting usage patterns or profiles.

4. Interpretation/evaluation of the discovered profiles. In this paper , further added

a fifth step after a repetitive application of steps 1-4 on multiple time periods.

5. Tracking the evolution of the discovered profiles.

Web usage mining can use various data mining or machine learning techniques to

model and understand Web user activity. In clustering was used to segment user sessions

into clusters or profiles that can later form the basis for personalization. Inthe notion of an

adaptive Web site was proposed, where the users access pattern can be used to

automatically synthesize index pages. The work in is based on using association rule

discovery as the basis for modeling Web user activity, whereas the approach proposed in

used probabilistic grammars to model Web navigation patterns for the purpose of

prediction. The approach in proposed building data cubes from Web log data and later

applying Online Analytical Processing (OLAP) and data mining on the cube model. Web

Utilization Miner (WUM) was presented to discover navigation patterns with user

specified characteristics over an aggregated materialized view of the Web log, consisting

of a tire of sequences of Web views. Web usage have recently become important. This is

because Web access patterns on a Web site are dynamic due not only to the dynamics of

Web site content and structure but also to changes in the users interests and, thus, their

navigation patterns.


5/24

In order to create the user's groups, i.e., user profiles only have access to the user's

browsing history. Assume that users with similar browsing patterns upon a point in time

should have similar interests and motivations. User profiles can be created through online

web usage mining, which consists in discovering web usage patterns to better understandthe users' behavior. For the task of creating user groups based on the web server access

logs, web usage mining uses clustering techniques. During a visit to a web site, the users'

requests are registered in a web server access log format stored in the web server.

Therefore, the web server access logs provide the means to create a data set prepared for

the application of clustering algorithms. User's interests are affected by the temporal

context, thus in some research work instead of creating user clusters it is presented the

concept of session clustering. A session comprises the browsing history of small time

windows, usually 30 minutes. Therefore by clustering sessions it is easier to comprehend

the contextual motivations of each user and provide ads suitable for current user's

interests. Propose to create user's profiles in a two stage process. First, one creates session

clusters, then creates users clusters based on common sessions between users. This work

is going to focus on the first part, create session clusters. It compare the resulting session

clusters by using different attributes to describe a session. Our approach for representing

a session consists in combining descriptions extracted from the URLs of the pages visited

with temporal frames based on date, such as Monday morning.

Session Identification

A session is a list of web pages accesses from a given user during a period of time.

Each access is registered in a line of the web server access log. For the task of identifying

the list of web pages visited during a user's session it is necessary to clean all the

information contained in the web server access logs that is meaningless or not relevant.

Though, browser and proxy caching represent a major drawback to the creation of a

reliable user session data set. The web server access log is a text file that contains all the

requests made to the web server, and usually they are in a Common Log Format , which

means that it contains the following fields:


6/24

_ IP address or domain name

_ User ID

_ Date and time of the request

_ HTTP request (including method and page requested)_ Status code response to the request

_ File size

_ Referrer (web page that contain the hyperlink that originated the request)

_ Web agent (user's browser)

The web server access logs used during this work contain accesses to web pages

from several web sites and in this case, the URL of the web pages is in the referrer. There

is also extra information about the request such as a session cookie and a long duration

cookie. The session cookie identifies a 30 minutes session and the long duration cookie

identifies a user. Therefore, only web server access log entries containing the session

cookie were considered. From these entries the web page URL (referrer), date and

session cookie are the meaningful data for the purpose of this work. Thereafter, these

parameters were grouped by common session cookie in order to create each session

representation vector.

Among them, clustering allows us to group together clients or data items that have

similar characteristics. The information discovered by this technique is one of the most

important types that has a wide range of applications from real-time personalization to

link prediction. It can facilitate the development of future marketing strategies, such as

automated return mail, present advertisements to clients falling within a certain cluster, or

dynamically changing a particular site for a client on a return visit based on past

classification of that client. The key problem lies in how we effectively discover clusters

of Web pages or users with common interest. Clustering analysis to mine the Web is

quite different from traditional clustering due to the inherent difference between Web

usage data clustering and classic clustering. Therefore, there is a need to develop

specialized techniques for clustering analysis based on Web usage data. Some approaches

to clustering analysis have been developed for mining the Web access logs.


7/24

Session Clustering

Thereafter the transformation of user sessions into a multi-dimensional space

as vectors of extracted attributes, clustering algorithms can partition this space into groupof sessions. Each session within a group has a close distance between the others in the

group, based on a distance measure. Regarding the clustering algorithms both model-

based and similarity-based are used to group users or sessions, as well as, hierarchical

and partitional techniques. The most common model-based algorithm is the Expectation-

maximization (EM) algorithm which has been used to identify associations among users

and pages as well to provide user profiles.

The proposed methodology is applied on Web users navigation patterns by a

model-based approach employing:

Cluster validation, i.e. evaluation of the results of a clustering algorithm in a

quantitative and objective manner. We propose a quantitative validation procedure, which

is based on the statistical chi-square (v2) test. Each cluster is represented by a probability

distribution and the chi-square metric is used to measure the distances between these

distributions and to test their homogeneity. Since the goal of a clustering procedure is to

discover groups in the data so that each group is significantly different from all the

others, It essentially test the heterogeneity between the clusters in order to assess their

successful discrimination.

Cluster interpretation, i.e. understanding and appropriately interpreting the

meaning of the derived clusters in the wider context of the underlying application, by

using statistical data analysis. Specifically, propose a visualization approach as a result

of the statistical method known as correspondence analysis, for interpreting

the clustering results. This analysis is used to facilitate revealing of similar or related

features in Web users navigation behavior and their interaction with the content of Web

information sources.

Clustering evaluation may be employed under three different views:


8/24

1. External view: when results of a clustering method are evaluated on the basis of

a pre-specified structure on a data set, which reflects a users intuition about the

clustering structure of this data set.

2. Internal view: clustering results are evaluated in terms of quantities obtainedfrom the data set itself.

3. Relative view: clustering result is compared with other clustering schemes, by

modifying only the parameter values.

System Architecture:

Web user

Session

Clustering

Tech

Trace user session

details

Clustering Selection

Hierarchical

Agglomerative

Clustering Creation


9/24

Modules

1. Trace user details

2. Clustering selection

3. Hierarchical Agglomerative

4. Clustering Creation

1. Trace user Details

In this module we trace user session details.Web user is identified by its client IP

address and by connections having TCP server port equal to 80 (HTTP protocol). Each

user trace, i.e., a trace containing only data with a given IP source address, is

preprocessed according to the following steps: i) data are partitioned day by day, ii) only

working hours of working days are considered, and iii) opening times of two consecutive

connections separated by more than half an hour are considered a priori as two

independent data sets.

2. Clustering Selection

In this module we read user session details .Using K means clustering initially cluster

the user session details.

3. Hierarchical Agglomerative:In this module a hierarchical agglomerative algorithm is iteratively run, using only

the representative samples to evaluate the distance between two clusters. Since the

procedure starts with initial clusters, the number of steps is bounded. At each step, thehierarchical agglomerative procedure merges the two closest clusters; then, distances

among clusters are recomputed. After iterations, the process ends.


10/24

4. Clustering Creation

A partition clustering procedure is run over the original data set, which includes all

samples using the optimal number of clusters determined so far and the same choice of

cluster representatives adopted in the first step. A fixed number of iterations is run to

obtain a final refinement of the clustering.


11/24

LANGAUGE SPECIFICATION

4.1 FEATURES OF. NET

Microsoft .NET is a set of Microsoft software

technologies for rapidly building and integrating XML Web

services, Microsoft Windows-based applications, and Web

solutions. The .NET Framework is a language-neutral platform for

writing programs that can easily and securely interoperate.

Theres no language barrier with .NET: there are numerous

languages available to the developer including Managed C++,

C#, Visual Basic and Java Script. The .NET framework provides the

foundation for components to interact seamlessly, whether locally

or remotely on different platforms. It standardizes common data

types and communications protocols so that components created

in different languages can easily interoperate.

.NET is also the collective name given to various

software components built upon the .NET platform. These will be

both products (Visual Studio.NET and Windows.NET Server, for


12/24

instance) and services (like Passport, .NET My Services, and so

on).

THE .NET FRAMEWORK

The .NET Framework has two main parts:

1. The Common Language Runtime (CLR).

2. A hierarchical set of class libraries.

The CLR is described as the execution engine of .NET. It

provides the environment within which programs run. The most

important features are


13/24

Conversion from a low-level assembler-style language,

called Intermediate Language (IL), into code native to

the platform being executed on.

Memory management, notably including garbage

collection.

Checking and enforcing security restrictions on the

running code.

Loading and executing programs, with version control

and other such features.

The following features of the .NET framework are also

worth description:

Managed Code

The code that targets .NET, and which contains certain

extra

Information - metadata - to describe itself. Whilst both managed

and unmanaged code can run in the runtime, only managed code

contains the information that allows the CLR to guarantee, for

instance, safe execution and interoperability.


14/24

Managed Data

With Managed Code comes Managed Data. CLR

provides memory allocation and Deal location facilities, and

garbage collection. Some .NET languages use Managed Data by

default, such as C#, Visual Basic.NET and JScript.NET, whereas

others, namely C++, do not. Targeting CLR can, depending on the

language youre using, impose certain constraints on the features

available. As with managed and unmanaged code, one can have

both managed and unmanaged data in .NET applications - data

that doesnt get garbage collected but instead is looked after by

unmanaged code.

Common Type System

The CLR uses something called the Common Type System

(CTS) to strictly enforce type-safety. This ensures that all classes

are compatible with each other, by describing types in a common


15/24

way. CTS define how types work within the runtime, which

enables types in one language to interoperate with types in

another language, including cross-language exception handling.

As well as ensuring that types are only used in appropriate ways,

the runtime also ensures that code doesnt attempt to access

memory that hasnt been allocated to it.

Common Language Specification

The CLR provides built-in support for language

interoperability. To ensure that you can develop managed code

that can be fully used by developers using any programming

language, a set of language features and rules for using them

called the Common Language Specification (CLS) has been

defined. Components that follow these rules and expose only CLS

features are considered CLS-compliant.


16/24

THE CLASS LIBRARY

.NET provides a single-rooted hierarchy of classes,

containing over 7000 types. The root of the namespace is called

System; this contains basic types like Byte, Double, Boolean, and

String, as well as Object. All objects derive from System. Object.

As well as objects, there are value types. Value types can be

allocated on the stack, which can provide useful flexibility. There

are also efficient means of converting value types to object types

if and when necessary.

The set of classes is pretty comprehensive, providing

collections, file, screen, and network I/O, threading, and so on, as

well as XML and database connectivity.

The class library is subdivided into a number of sets (or

namespaces), each providing distinct areas of functionality, with

dependencies between the namespaces kept to a minimum.


17/24

LANGUAGES SUPPORTED BY .NET

The multi-language capability of the .NET Framework

and Visual Studio .NET enables developers to use their existing

programming skills to build all types of applications and XML Web

services. The .NET framework supports new versions of

Microsofts old favorites Visual Basic and C++ (as VB.NET and

Managed C++), but there are also a number of new additions to

the family.

Visual Basic .NET has been updated to include many

new and improved language features that make it a powerful

object-oriented programming language. These features include

inheritance, interfaces, and overloading, among others. Visual

Basic also now supports structured exception handling, custom

attributes and also supports multi-threading.


18/24

Visual Basic .NET is also CLS compliant, which means

that any CLS-compliant language can use the classes, objects,

and components you create in Visual Basic .NET.

Managed Extensions for C++ and attributed

programming are just some of the enhancements made to the C+

+ language. Managed Extensions simplify the task of migrating

existing C++ applications to the new .NET Framework.

C# is Microsofts new language. Its a C-style language

that is essentially C++ for Rapid Application Development.

Unlike other languages, its specification is just the grammar of

the language. It has no standard library of its own, and instead

has been designed with the intention of using the .NET libraries as

its own.

Microsoft Visual J# .NET provides the easiest transition

for Java-language developers into the world of XML Web Services

and dramatically improves the interoperability of Java-language


19/24

programs with existing software written in a variety of other

programming languages.

Active State has created Visual Perl and Visual Python,

which enable .NET-aware applications to be built in either Perl or

Python. Both products can be integrated into the Visual Studio

.NET environment. Visual Perl includes support for Active States

Perl Dev Kit.

Other languages for which .NET compilers are available include

FORTRAN

COBOL


20/24

Eiffel

Fig1 .Net Framework

ASP.NET

XML WEB

SERVICES

Windows

Forms

Base Class Libraries

Common Language Runtime

Operating System

4.2 FEATURES OF C#. NET


21/24

C#.NET is also compliant with CLS (Common Language Specification) and

supports structured exception handling. CLS is set of rules and constructs that

are supported by the CLR (Common Language Runtime). CLR is the runtime

environment provided by the .NET Framework; it manages the execution of the

code and also makes the development process easier by providing services.

C#.NET is a CLS-compliant language. Any objects, classes, or components that

created in C#.NET can be used in any other CLS-compliant language. In

addition, we can use objects, classes, and components created in other CLS-

compliant languages in C#.NET .The use of CLS ensures complete

interoperability among applications, regardless of the languages used to create

the application.

CONSTRUCTORS AND DESTRUCTORS:

Constructors are used to initialize objects, whereas destructors are used to

destroy them. In other words, destructors are used to release the resources


22/24

allocated to the object. In C#.NET the sub finalize procedure is available. The

sub finalize procedure is used to complete the tasks that must be performed

when an object is destroyed. The sub finalize procedure is called automatically

when an object is destroyed. In addition, the sub finalize procedure can be

called only from the class it belongs to or from derived classes.

GARBAGE COLLECTION

Garbage Collection is another new feature in C#.NET. The .NET Framework

monitors allocated resources, such as objects and variables. In addition, the

.NET Framework automatically releases memory for reuse by destroying

objects that are no longer in use.

In C#.NET, the garbage collector checks for the objects that are not currently in

use by applications. When the garbage collector comes across an object that is

marked for garbage collection, it releases the memory occupied by the object.

OVERLOADING

Overloading is another feature in C#. Overloading enables us to define multiple

procedures with the same name, where each procedure has a different set of

arguments. Besides using overloading for procedures, we can use it for

constructors and properties in a class.


23/24

MULTITHREADING:

C#.NET also supports multithreading. An application that supports

multithreading can handle multiple tasks simultaneously, we can use

multithreading to decrease the time taken by an application to respond to user

interaction.

STRUCTURED EXCEPTION HANDLING

C#.NET supports structured handling, which enables us to detect and

remove errors at runtime. In C#.NET, we need to use TryCatchFinally

statements to create exception handlers. Using TryCatchFinally statements,

we can create robust and effective exception handlers to improve the

performance of our application.

THE .NET FRAMEWORK

The .NET Framework is a new computing platform that simplifies

application development in the highly distributed environment of the Internet.

OBJECTIVES OF. NET FRAMEWORK


24/24

1. To provide a consistent object-oriented programming environment whether

object codes is stored and executed locally on Internet-distributed, or executed

remotely.

2. To provide a code-execution environment to minimizes software deployment

and guarantees safe execution of code.

3. Eliminates the performance problems.

There are different types of application, such as Windows-based applications

and Web-based applications.

Date post:	08-Apr-2018
Category:	Documents
Upload:	suhaskumar86
View:	225 times
Download:	0 times

Web user session

Documents