Date post: | 08-Apr-2018 |
Category: |
Documents |
Upload: | suhaskumar86 |
View: | 225 times |
Download: | 0 times |
of 24
8/7/2019 Web user session
1/24
WEB USER -SESSION TERFERENCE BY MEANS
INTRODUCTION
The explosive growth of the Web has drastically changed the way in which
information is managed and accessed. The large-scale of Web data1 sources and the wide
availability of services over the Internet have increased the need for effective Web data
management techniques and mechanisms. Understanding how users navigate over Web
sources is essential both for computing practitioners and researchers. In this context, Web
data clustering has been widely used for increasing Web information accessibility,
understanding users navigation behavior, improving information retrieval and content
delivery on the Web.
Computer user with an enormous flood of information. To almost any topic one
can think of, one can find pieces of information that are made available by other internet
citizens, ranging from individual users that post an inventory of their record collection, to
major companies that do business over the Web. To be able to cope with the abundance
of available information, users of the WWW need to rely on intelligent tools that assist
them in finding, sorting, and filtering the available information. Just as data mining aims
at discovering valuable information that is hidden in conventional databases, the
emerging field of Web mining aims at finding and extracting relevant information that is
hidden in Web-related data, in particular in text documents that are published on the
Web. Like data mining, Web mining is a multi-disciplinary effort that draws techniques
from fields like information retrieval, statistics, and machine learning, natural language
processing, and others.
8/7/2019 Web user session
2/24
The World Wide Web has become increasingly important as a medium for
commerce as well as for dissemination of information. In E-commerce, companies want
to analyze the users preferences to place advertisements, to decide their market strategy,
and to provide customized guide to Web customers. In todays information based society,there is an urge for Web surfers to find the needed information from the overwhelming
resources on the Internet. Web access log contains a lot of information that allows us to
observe users interest with the site. Properly exploited, this information can assist us to
make improvements to the Web site, create a more effective Web site organization and to
help users navigate through enormous Web documents. Therefore, data mining, which is
referred to as knowledge discovery in database (KDD), has been naturally introduced to
the World Wide Web.
Web usage data collected in access log is at a very fine granularity. It usually
includes every HTTP request from all users. Each request contains at least the IP address,
requested pages, time requested, response code, and size of the item requested. Therefore,
while the access log has the advantage of being extremely detailed, it also has some
drawbacks. When we apply statistical and probability methods to it, we tend to get results
that are too refined than it should be because the analysis might focus on micro trends
rather than macro trends. However, based on our observation, users browsing behavior
on the Web is highly uncertain. Users might browse the same page for different purposes,
spend various amounts of time on the same page or make different number of visits on it,
or even get to the page from different sources each time. Therefore, micro trends tend to
be erroneous and not of much use.
Depending on the nature of the data, one can distinguish three main areas of
research within the Web mining community:
Web Content Mining: application of data mining techniques to unstructured or semi
structured data, usually HTML-documents
Web Structure Mining: use of the hyperlink structure of the Web as an (additional)
information source
8/7/2019 Web user session
3/24
Web Usage Mining: analysis of user interactions with a Web server (e.g., click stream
analysis)
AN OVERVIEW OF WEB USAGE MINING
Web Usage Mining Process:
The main processes in Web Usage Mining are:
Preprocessing: Data preprocessing describes any type of processing performed on
raw data to prepare it for another processing procedure. Commonly used as a preliminary
data mining practice, data preprocessing transforms the data into a format that will be
more easily and effectively processed for the purpose of the user.
The different types of preprocessing in Web Usage Mining are:
1. Usage Pre-Processing: Pre- Processing relating to Usage patterns of users.
2. Content Pre-Processing: Pre- Processing of content accessed.
3. Structure Pre-Processing: Pre- Processing related to structure of the website.
Pattern Discovery: Web Usage mining can be used to uncover patterns in server
logs but is often carried out only on samples of data. The mining process will be
ineffective if the samples are not a good representation of the larger body of data.
The following are the pattern discovery methods.
1. Statistical Analysis
2. Association Rules
3. Clustering
4. Classification
5. Sequential Patterns
6. Dependency Modeling
8/7/2019 Web user session
4/24
Recently, data mining techniques have been applied to extract usage patterns from
Web log data. This process, known as Web usage mining, is traditionally performed in
several stages to achieve its goals:
1. Collection of Web data such as activities/click streams recorded in Web serverlogs,
2. Preprocessing of Web data such as filtering crawlers requests, requests to
graphics, and identifying unique sessions,
3. Analysis of Web data, also known as Web Usage Mining , to discover
interesting usage patterns or profiles.
4. Interpretation/evaluation of the discovered profiles. In this paper , further added
a fifth step after a repetitive application of steps 1-4 on multiple time periods.
5. Tracking the evolution of the discovered profiles.
Web usage mining can use various data mining or machine learning techniques to
model and understand Web user activity. In clustering was used to segment user sessions
into clusters or profiles that can later form the basis for personalization. Inthe notion of an
adaptive Web site was proposed, where the users access pattern can be used to
automatically synthesize index pages. The work in is based on using association rule
discovery as the basis for modeling Web user activity, whereas the approach proposed in
used probabilistic grammars to model Web navigation patterns for the purpose of
prediction. The approach in proposed building data cubes from Web log data and later
applying Online Analytical Processing (OLAP) and data mining on the cube model. Web
Utilization Miner (WUM) was presented to discover navigation patterns with user
specified characteristics over an aggregated materialized view of the Web log, consisting
of a tire of sequences of Web views. Web usage have recently become important. This is
because Web access patterns on a Web site are dynamic due not only to the dynamics of
Web site content and structure but also to changes in the users interests and, thus, their
navigation patterns.
8/7/2019 Web user session
5/24
In order to create the user's groups, i.e., user profiles only have access to the user's
browsing history. Assume that users with similar browsing patterns upon a point in time
should have similar interests and motivations. User profiles can be created through online
web usage mining, which consists in discovering web usage patterns to better understandthe users' behavior. For the task of creating user groups based on the web server access
logs, web usage mining uses clustering techniques. During a visit to a web site, the users'
requests are registered in a web server access log format stored in the web server.
Therefore, the web server access logs provide the means to create a data set prepared for
the application of clustering algorithms. User's interests are affected by the temporal
context, thus in some research work instead of creating user clusters it is presented the
concept of session clustering. A session comprises the browsing history of small time
windows, usually 30 minutes. Therefore by clustering sessions it is easier to comprehend
the contextual motivations of each user and provide ads suitable for current user's
interests. Propose to create user's profiles in a two stage process. First, one creates session
clusters, then creates users clusters based on common sessions between users. This work
is going to focus on the first part, create session clusters. It compare the resulting session
clusters by using different attributes to describe a session. Our approach for representing
a session consists in combining descriptions extracted from the URLs of the pages visited
with temporal frames based on date, such as Monday morning.
Session Identification
A session is a list of web pages accesses from a given user during a period of time.
Each access is registered in a line of the web server access log. For the task of identifying
the list of web pages visited during a user's session it is necessary to clean all the
information contained in the web server access logs that is meaningless or not relevant.
Though, browser and proxy caching represent a major drawback to the creation of a
reliable user session data set. The web server access log is a text file that contains all the
requests made to the web server, and usually they are in a Common Log Format , which
means that it contains the following fields:
8/7/2019 Web user session
6/24
_ IP address or domain name
_ User ID
_ Date and time of the request
_ HTTP request (including method and page requested)_ Status code response to the request
_ File size
_ Referrer (web page that contain the hyperlink that originated the request)
_ Web agent (user's browser)
The web server access logs used during this work contain accesses to web pages
from several web sites and in this case, the URL of the web pages is in the referrer. There
is also extra information about the request such as a session cookie and a long duration
cookie. The session cookie identifies a 30 minutes session and the long duration cookie
identifies a user. Therefore, only web server access log entries containing the session
cookie were considered. From these entries the web page URL (referrer), date and
session cookie are the meaningful data for the purpose of this work. Thereafter, these
parameters were grouped by common session cookie in order to create each session
representation vector.
Among them, clustering allows us to group together clients or data items that have
similar characteristics. The information discovered by this technique is one of the most
important types that has a wide range of applications from real-time personalization to
link prediction. It can facilitate the development of future marketing strategies, such as
automated return mail, present advertisements to clients falling within a certain cluster, or
dynamically changing a particular site for a client on a return visit based on past
classification of that client. The key problem lies in how we effectively discover clusters
of Web pages or users with common interest. Clustering analysis to mine the Web is
quite different from traditional clustering due to the inherent difference between Web
usage data clustering and classic clustering. Therefore, there is a need to develop
specialized techniques for clustering analysis based on Web usage data. Some approaches
to clustering analysis have been developed for mining the Web access logs.
8/7/2019 Web user session
7/24
Session Clustering
Thereafter the transformation of user sessions into a multi-dimensional space
as vectors of extracted attributes, clustering algorithms can partition this space into groupof sessions. Each session within a group has a close distance between the others in the
group, based on a distance measure. Regarding the clustering algorithms both model-
based and similarity-based are used to group users or sessions, as well as, hierarchical
and partitional techniques. The most common model-based algorithm is the Expectation-
maximization (EM) algorithm which has been used to identify associations among users
and pages as well to provide user profiles.
The proposed methodology is applied on Web users navigation patterns by a
model-based approach employing:
Cluster validation, i.e. evaluation of the results of a clustering algorithm in a
quantitative and objective manner. We propose a quantitative validation procedure, which
is based on the statistical chi-square (v2) test. Each cluster is represented by a probability
distribution and the chi-square metric is used to measure the distances between these
distributions and to test their homogeneity. Since the goal of a clustering procedure is to
discover groups in the data so that each group is significantly different from all the
others, It essentially test the heterogeneity between the clusters in order to assess their
successful discrimination.
Cluster interpretation, i.e. understanding and appropriately interpreting the
meaning of the derived clusters in the wider context of the underlying application, by
using statistical data analysis. Specifically, propose a visualization approach as a result
of the statistical method known as correspondence analysis, for interpreting
the clustering results. This analysis is used to facilitate revealing of similar or related
features in Web users navigation behavior and their interaction with the content of Web
information sources.
Clustering evaluation may be employed under three different views:
8/7/2019 Web user session
8/24
1. External view: when results of a clustering method are evaluated on the basis of
a pre-specified structure on a data set, which reflects a users intuition about the
clustering structure of this data set.
2. Internal view: clustering results are evaluated in terms of quantities obtainedfrom the data set itself.
3. Relative view: clustering result is compared with other clustering schemes, by
modifying only the parameter values.
System Architecture:
Web user
Session
Clustering
Tech
Trace user session
details
Clustering Selection
Hierarchical
Agglomerative
Clustering Creation
8/7/2019 Web user session
9/24
Modules
1. Trace user details
2. Clustering selection
3. Hierarchical Agglomerative
4. Clustering Creation
1. Trace user Details
In this module we trace user session details.Web user is identified by its client IP
address and by connections having TCP server port equal to 80 (HTTP protocol). Each
user trace, i.e., a trace containing only data with a given IP source address, is
preprocessed according to the following steps: i) data are partitioned day by day, ii) only
working hours of working days are considered, and iii) opening times of two consecutive
connections separated by more than half an hour are considered a priori as two
independent data sets.
2. Clustering Selection
In this module we read user session details .Using K means clustering initially cluster
the user session details.
3. Hierarchical Agglomerative:In this module a hierarchical agglomerative algorithm is iteratively run, using only
the representative samples to evaluate the distance between two clusters. Since the
procedure starts with initial clusters, the number of steps is bounded. At each step, thehierarchical agglomerative procedure merges the two closest clusters; then, distances
among clusters are recomputed. After iterations, the process ends.
8/7/2019 Web user session
10/24
4. Clustering Creation
A partition clustering procedure is run over the original data set, which includes all
samples using the optimal number of clusters determined so far and the same choice of
cluster representatives adopted in the first step. A fixed number of iterations is run to
obtain a final refinement of the clustering.
8/7/2019 Web user session
11/24
LANGAUGE SPECIFICATION
4.1 FEATURES OF. NET
Microsoft .NET is a set of Microsoft software
technologies for rapidly building and integrating XML Web
services, Microsoft Windows-based applications, and Web
solutions. The .NET Framework is a language-neutral platform for
writing programs that can easily and securely interoperate.
Theres no language barrier with .NET: there are numerous
languages available to the developer including Managed C++,
C#, Visual Basic and Java Script. The .NET framework provides the
foundation for components to interact seamlessly, whether locally
or remotely on different platforms. It standardizes common data
types and communications protocols so that components created
in different languages can easily interoperate.
.NET is also the collective name given to various
software components built upon the .NET platform. These will be
both products (Visual Studio.NET and Windows.NET Server, for
8/7/2019 Web user session
12/24
instance) and services (like Passport, .NET My Services, and so
on).
THE .NET FRAMEWORK
The .NET Framework has two main parts:
1. The Common Language Runtime (CLR).
2. A hierarchical set of class libraries.
The CLR is described as the execution engine of .NET. It
provides the environment within which programs run. The most
important features are
8/7/2019 Web user session
13/24
Conversion from a low-level assembler-style language,
called Intermediate Language (IL), into code native to
the platform being executed on.
Memory management, notably including garbage
collection.
Checking and enforcing security restrictions on the
running code.
Loading and executing programs, with version control
and other such features.
The following features of the .NET framework are also
worth description:
Managed Code
The code that targets .NET, and which contains certain
extra
Information - metadata - to describe itself. Whilst both managed
and unmanaged code can run in the runtime, only managed code
contains the information that allows the CLR to guarantee, for
instance, safe execution and interoperability.
8/7/2019 Web user session
14/24
Managed Data
With Managed Code comes Managed Data. CLR
provides memory allocation and Deal location facilities, and
garbage collection. Some .NET languages use Managed Data by
default, such as C#, Visual Basic.NET and JScript.NET, whereas
others, namely C++, do not. Targeting CLR can, depending on the
language youre using, impose certain constraints on the features
available. As with managed and unmanaged code, one can have
both managed and unmanaged data in .NET applications - data
that doesnt get garbage collected but instead is looked after by
unmanaged code.
Common Type System
The CLR uses something called the Common Type System
(CTS) to strictly enforce type-safety. This ensures that all classes
are compatible with each other, by describing types in a common
8/7/2019 Web user session
15/24
way. CTS define how types work within the runtime, which
enables types in one language to interoperate with types in
another language, including cross-language exception handling.
As well as ensuring that types are only used in appropriate ways,
the runtime also ensures that code doesnt attempt to access
memory that hasnt been allocated to it.
Common Language Specification
The CLR provides built-in support for language
interoperability. To ensure that you can develop managed code
that can be fully used by developers using any programming
language, a set of language features and rules for using them
called the Common Language Specification (CLS) has been
defined. Components that follow these rules and expose only CLS
features are considered CLS-compliant.
8/7/2019 Web user session
16/24
THE CLASS LIBRARY
.NET provides a single-rooted hierarchy of classes,
containing over 7000 types. The root of the namespace is called
System; this contains basic types like Byte, Double, Boolean, and
String, as well as Object. All objects derive from System. Object.
As well as objects, there are value types. Value types can be
allocated on the stack, which can provide useful flexibility. There
are also efficient means of converting value types to object types
if and when necessary.
The set of classes is pretty comprehensive, providing
collections, file, screen, and network I/O, threading, and so on, as
well as XML and database connectivity.
The class library is subdivided into a number of sets (or
namespaces), each providing distinct areas of functionality, with
dependencies between the namespaces kept to a minimum.
8/7/2019 Web user session
17/24
LANGUAGES SUPPORTED BY .NET
The multi-language capability of the .NET Framework
and Visual Studio .NET enables developers to use their existing
programming skills to build all types of applications and XML Web
services. The .NET framework supports new versions of
Microsofts old favorites Visual Basic and C++ (as VB.NET and
Managed C++), but there are also a number of new additions to
the family.
Visual Basic .NET has been updated to include many
new and improved language features that make it a powerful
object-oriented programming language. These features include
inheritance, interfaces, and overloading, among others. Visual
Basic also now supports structured exception handling, custom
attributes and also supports multi-threading.
8/7/2019 Web user session
18/24
Visual Basic .NET is also CLS compliant, which means
that any CLS-compliant language can use the classes, objects,
and components you create in Visual Basic .NET.
Managed Extensions for C++ and attributed
programming are just some of the enhancements made to the C+
+ language. Managed Extensions simplify the task of migrating
existing C++ applications to the new .NET Framework.
C# is Microsofts new language. Its a C-style language
that is essentially C++ for Rapid Application Development.
Unlike other languages, its specification is just the grammar of
the language. It has no standard library of its own, and instead
has been designed with the intention of using the .NET libraries as
its own.
Microsoft Visual J# .NET provides the easiest transition
for Java-language developers into the world of XML Web Services
and dramatically improves the interoperability of Java-language
8/7/2019 Web user session
19/24
programs with existing software written in a variety of other
programming languages.
Active State has created Visual Perl and Visual Python,
which enable .NET-aware applications to be built in either Perl or
Python. Both products can be integrated into the Visual Studio
.NET environment. Visual Perl includes support for Active States
Perl Dev Kit.
Other languages for which .NET compilers are available include
FORTRAN
COBOL
8/7/2019 Web user session
20/24
Eiffel
Fig1 .Net Framework
ASP.NET
XML WEB
SERVICES
Windows
Forms
Base Class Libraries
Common Language Runtime
Operating System
4.2 FEATURES OF C#. NET
8/7/2019 Web user session
21/24
C#.NET is also compliant with CLS (Common Language Specification) and
supports structured exception handling. CLS is set of rules and constructs that
are supported by the CLR (Common Language Runtime). CLR is the runtime
environment provided by the .NET Framework; it manages the execution of the
code and also makes the development process easier by providing services.
C#.NET is a CLS-compliant language. Any objects, classes, or components that
created in C#.NET can be used in any other CLS-compliant language. In
addition, we can use objects, classes, and components created in other CLS-
compliant languages in C#.NET .The use of CLS ensures complete
interoperability among applications, regardless of the languages used to create
the application.
CONSTRUCTORS AND DESTRUCTORS:
Constructors are used to initialize objects, whereas destructors are used to
destroy them. In other words, destructors are used to release the resources
8/7/2019 Web user session
22/24
allocated to the object. In C#.NET the sub finalize procedure is available. The
sub finalize procedure is used to complete the tasks that must be performed
when an object is destroyed. The sub finalize procedure is called automatically
when an object is destroyed. In addition, the sub finalize procedure can be
called only from the class it belongs to or from derived classes.
GARBAGE COLLECTION
Garbage Collection is another new feature in C#.NET. The .NET Framework
monitors allocated resources, such as objects and variables. In addition, the
.NET Framework automatically releases memory for reuse by destroying
objects that are no longer in use.
In C#.NET, the garbage collector checks for the objects that are not currently in
use by applications. When the garbage collector comes across an object that is
marked for garbage collection, it releases the memory occupied by the object.
OVERLOADING
Overloading is another feature in C#. Overloading enables us to define multiple
procedures with the same name, where each procedure has a different set of
arguments. Besides using overloading for procedures, we can use it for
constructors and properties in a class.
8/7/2019 Web user session
23/24
MULTITHREADING:
C#.NET also supports multithreading. An application that supports
multithreading can handle multiple tasks simultaneously, we can use
multithreading to decrease the time taken by an application to respond to user
interaction.
STRUCTURED EXCEPTION HANDLING
C#.NET supports structured handling, which enables us to detect and
remove errors at runtime. In C#.NET, we need to use TryCatchFinally
statements to create exception handlers. Using TryCatchFinally statements,
we can create robust and effective exception handlers to improve the
performance of our application.
THE .NET FRAMEWORK
The .NET Framework is a new computing platform that simplifies
application development in the highly distributed environment of the Internet.
OBJECTIVES OF. NET FRAMEWORK
8/7/2019 Web user session
24/24
1. To provide a consistent object-oriented programming environment whether
object codes is stored and executed locally on Internet-distributed, or executed
remotely.
2. To provide a code-execution environment to minimizes software deployment
and guarantees safe execution of code.
3. Eliminates the performance problems.
There are different types of application, such as Windows-based applications
and Web-based applications.