+ All Categories
Home > Documents > Web user session

Web user session

Date post: 08-Apr-2018
Category:
Upload: suhaskumar86
View: 225 times
Download: 0 times
Share this document with a friend

of 24

Transcript
  • 8/7/2019 Web user session

    1/24

    WEB USER -SESSION TERFERENCE BY MEANS

    INTRODUCTION

    The explosive growth of the Web has drastically changed the way in which

    information is managed and accessed. The large-scale of Web data1 sources and the wide

    availability of services over the Internet have increased the need for effective Web data

    management techniques and mechanisms. Understanding how users navigate over Web

    sources is essential both for computing practitioners and researchers. In this context, Web

    data clustering has been widely used for increasing Web information accessibility,

    understanding users navigation behavior, improving information retrieval and content

    delivery on the Web.

    Computer user with an enormous flood of information. To almost any topic one

    can think of, one can find pieces of information that are made available by other internet

    citizens, ranging from individual users that post an inventory of their record collection, to

    major companies that do business over the Web. To be able to cope with the abundance

    of available information, users of the WWW need to rely on intelligent tools that assist

    them in finding, sorting, and filtering the available information. Just as data mining aims

    at discovering valuable information that is hidden in conventional databases, the

    emerging field of Web mining aims at finding and extracting relevant information that is

    hidden in Web-related data, in particular in text documents that are published on the

    Web. Like data mining, Web mining is a multi-disciplinary effort that draws techniques

    from fields like information retrieval, statistics, and machine learning, natural language

    processing, and others.

  • 8/7/2019 Web user session

    2/24

    The World Wide Web has become increasingly important as a medium for

    commerce as well as for dissemination of information. In E-commerce, companies want

    to analyze the users preferences to place advertisements, to decide their market strategy,

    and to provide customized guide to Web customers. In todays information based society,there is an urge for Web surfers to find the needed information from the overwhelming

    resources on the Internet. Web access log contains a lot of information that allows us to

    observe users interest with the site. Properly exploited, this information can assist us to

    make improvements to the Web site, create a more effective Web site organization and to

    help users navigate through enormous Web documents. Therefore, data mining, which is

    referred to as knowledge discovery in database (KDD), has been naturally introduced to

    the World Wide Web.

    Web usage data collected in access log is at a very fine granularity. It usually

    includes every HTTP request from all users. Each request contains at least the IP address,

    requested pages, time requested, response code, and size of the item requested. Therefore,

    while the access log has the advantage of being extremely detailed, it also has some

    drawbacks. When we apply statistical and probability methods to it, we tend to get results

    that are too refined than it should be because the analysis might focus on micro trends

    rather than macro trends. However, based on our observation, users browsing behavior

    on the Web is highly uncertain. Users might browse the same page for different purposes,

    spend various amounts of time on the same page or make different number of visits on it,

    or even get to the page from different sources each time. Therefore, micro trends tend to

    be erroneous and not of much use.

    Depending on the nature of the data, one can distinguish three main areas of

    research within the Web mining community:

    Web Content Mining: application of data mining techniques to unstructured or semi

    structured data, usually HTML-documents

    Web Structure Mining: use of the hyperlink structure of the Web as an (additional)

    information source

  • 8/7/2019 Web user session

    3/24

    Web Usage Mining: analysis of user interactions with a Web server (e.g., click stream

    analysis)

    AN OVERVIEW OF WEB USAGE MINING

    Web Usage Mining Process:

    The main processes in Web Usage Mining are:

    Preprocessing: Data preprocessing describes any type of processing performed on

    raw data to prepare it for another processing procedure. Commonly used as a preliminary

    data mining practice, data preprocessing transforms the data into a format that will be

    more easily and effectively processed for the purpose of the user.

    The different types of preprocessing in Web Usage Mining are:

    1. Usage Pre-Processing: Pre- Processing relating to Usage patterns of users.

    2. Content Pre-Processing: Pre- Processing of content accessed.

    3. Structure Pre-Processing: Pre- Processing related to structure of the website.

    Pattern Discovery: Web Usage mining can be used to uncover patterns in server

    logs but is often carried out only on samples of data. The mining process will be

    ineffective if the samples are not a good representation of the larger body of data.

    The following are the pattern discovery methods.

    1. Statistical Analysis

    2. Association Rules

    3. Clustering

    4. Classification

    5. Sequential Patterns

    6. Dependency Modeling

  • 8/7/2019 Web user session

    4/24

    Recently, data mining techniques have been applied to extract usage patterns from

    Web log data. This process, known as Web usage mining, is traditionally performed in

    several stages to achieve its goals:

    1. Collection of Web data such as activities/click streams recorded in Web serverlogs,

    2. Preprocessing of Web data such as filtering crawlers requests, requests to

    graphics, and identifying unique sessions,

    3. Analysis of Web data, also known as Web Usage Mining , to discover

    interesting usage patterns or profiles.

    4. Interpretation/evaluation of the discovered profiles. In this paper , further added

    a fifth step after a repetitive application of steps 1-4 on multiple time periods.

    5. Tracking the evolution of the discovered profiles.

    Web usage mining can use various data mining or machine learning techniques to

    model and understand Web user activity. In clustering was used to segment user sessions

    into clusters or profiles that can later form the basis for personalization. Inthe notion of an

    adaptive Web site was proposed, where the users access pattern can be used to

    automatically synthesize index pages. The work in is based on using association rule

    discovery as the basis for modeling Web user activity, whereas the approach proposed in

    used probabilistic grammars to model Web navigation patterns for the purpose of

    prediction. The approach in proposed building data cubes from Web log data and later

    applying Online Analytical Processing (OLAP) and data mining on the cube model. Web

    Utilization Miner (WUM) was presented to discover navigation patterns with user

    specified characteristics over an aggregated materialized view of the Web log, consisting

    of a tire of sequences of Web views. Web usage have recently become important. This is

    because Web access patterns on a Web site are dynamic due not only to the dynamics of

    Web site content and structure but also to changes in the users interests and, thus, their

    navigation patterns.

  • 8/7/2019 Web user session

    5/24

    In order to create the user's groups, i.e., user profiles only have access to the user's

    browsing history. Assume that users with similar browsing patterns upon a point in time

    should have similar interests and motivations. User profiles can be created through online

    web usage mining, which consists in discovering web usage patterns to better understandthe users' behavior. For the task of creating user groups based on the web server access

    logs, web usage mining uses clustering techniques. During a visit to a web site, the users'

    requests are registered in a web server access log format stored in the web server.

    Therefore, the web server access logs provide the means to create a data set prepared for

    the application of clustering algorithms. User's interests are affected by the temporal

    context, thus in some research work instead of creating user clusters it is presented the

    concept of session clustering. A session comprises the browsing history of small time

    windows, usually 30 minutes. Therefore by clustering sessions it is easier to comprehend

    the contextual motivations of each user and provide ads suitable for current user's

    interests. Propose to create user's profiles in a two stage process. First, one creates session

    clusters, then creates users clusters based on common sessions between users. This work

    is going to focus on the first part, create session clusters. It compare the resulting session

    clusters by using different attributes to describe a session. Our approach for representing

    a session consists in combining descriptions extracted from the URLs of the pages visited

    with temporal frames based on date, such as Monday morning.

    Session Identification

    A session is a list of web pages accesses from a given user during a period of time.

    Each access is registered in a line of the web server access log. For the task of identifying

    the list of web pages visited during a user's session it is necessary to clean all the

    information contained in the web server access logs that is meaningless or not relevant.

    Though, browser and proxy caching represent a major drawback to the creation of a

    reliable user session data set. The web server access log is a text file that contains all the

    requests made to the web server, and usually they are in a Common Log Format , which

    means that it contains the following fields:

  • 8/7/2019 Web user session

    6/24

    _ IP address or domain name

    _ User ID

    _ Date and time of the request

    _ HTTP request (including method and page requested)_ Status code response to the request

    _ File size

    _ Referrer (web page that contain the hyperlink that originated the request)

    _ Web agent (user's browser)

    The web server access logs used during this work contain accesses to web pages

    from several web sites and in this case, the URL of the web pages is in the referrer. There

    is also extra information about the request such as a session cookie and a long duration

    cookie. The session cookie identifies a 30 minutes session and the long duration cookie

    identifies a user. Therefore, only web server access log entries containing the session

    cookie were considered. From these entries the web page URL (referrer), date and

    session cookie are the meaningful data for the purpose of this work. Thereafter, these

    parameters were grouped by common session cookie in order to create each session

    representation vector.

    Among them, clustering allows us to group together clients or data items that have

    similar characteristics. The information discovered by this technique is one of the most

    important types that has a wide range of applications from real-time personalization to

    link prediction. It can facilitate the development of future marketing strategies, such as

    automated return mail, present advertisements to clients falling within a certain cluster, or

    dynamically changing a particular site for a client on a return visit based on past

    classification of that client. The key problem lies in how we effectively discover clusters

    of Web pages or users with common interest. Clustering analysis to mine the Web is

    quite different from traditional clustering due to the inherent difference between Web

    usage data clustering and classic clustering. Therefore, there is a need to develop

    specialized techniques for clustering analysis based on Web usage data. Some approaches

    to clustering analysis have been developed for mining the Web access logs.

  • 8/7/2019 Web user session

    7/24

    Session Clustering

    Thereafter the transformation of user sessions into a multi-dimensional space

    as vectors of extracted attributes, clustering algorithms can partition this space into groupof sessions. Each session within a group has a close distance between the others in the

    group, based on a distance measure. Regarding the clustering algorithms both model-

    based and similarity-based are used to group users or sessions, as well as, hierarchical

    and partitional techniques. The most common model-based algorithm is the Expectation-

    maximization (EM) algorithm which has been used to identify associations among users

    and pages as well to provide user profiles.

    The proposed methodology is applied on Web users navigation patterns by a

    model-based approach employing:

    Cluster validation, i.e. evaluation of the results of a clustering algorithm in a

    quantitative and objective manner. We propose a quantitative validation procedure, which

    is based on the statistical chi-square (v2) test. Each cluster is represented by a probability

    distribution and the chi-square metric is used to measure the distances between these

    distributions and to test their homogeneity. Since the goal of a clustering procedure is to

    discover groups in the data so that each group is significantly different from all the

    others, It essentially test the heterogeneity between the clusters in order to assess their

    successful discrimination.

    Cluster interpretation, i.e. understanding and appropriately interpreting the

    meaning of the derived clusters in the wider context of the underlying application, by

    using statistical data analysis. Specifically, propose a visualization approach as a result

    of the statistical method known as correspondence analysis, for interpreting

    the clustering results. This analysis is used to facilitate revealing of similar or related

    features in Web users navigation behavior and their interaction with the content of Web

    information sources.

    Clustering evaluation may be employed under three different views:

  • 8/7/2019 Web user session

    8/24

    1. External view: when results of a clustering method are evaluated on the basis of

    a pre-specified structure on a data set, which reflects a users intuition about the

    clustering structure of this data set.

    2. Internal view: clustering results are evaluated in terms of quantities obtainedfrom the data set itself.

    3. Relative view: clustering result is compared with other clustering schemes, by

    modifying only the parameter values.

    System Architecture:

    Web user

    Session

    Clustering

    Tech

    Trace user session

    details

    Clustering Selection

    Hierarchical

    Agglomerative

    Clustering Creation

  • 8/7/2019 Web user session

    9/24

    Modules

    1. Trace user details

    2. Clustering selection

    3. Hierarchical Agglomerative

    4. Clustering Creation

    1. Trace user Details

    In this module we trace user session details.Web user is identified by its client IP

    address and by connections having TCP server port equal to 80 (HTTP protocol). Each

    user trace, i.e., a trace containing only data with a given IP source address, is

    preprocessed according to the following steps: i) data are partitioned day by day, ii) only

    working hours of working days are considered, and iii) opening times of two consecutive

    connections separated by more than half an hour are considered a priori as two

    independent data sets.

    2. Clustering Selection

    In this module we read user session details .Using K means clustering initially cluster

    the user session details.

    3. Hierarchical Agglomerative:In this module a hierarchical agglomerative algorithm is iteratively run, using only

    the representative samples to evaluate the distance between two clusters. Since the

    procedure starts with initial clusters, the number of steps is bounded. At each step, thehierarchical agglomerative procedure merges the two closest clusters; then, distances

    among clusters are recomputed. After iterations, the process ends.

  • 8/7/2019 Web user session

    10/24

    4. Clustering Creation

    A partition clustering procedure is run over the original data set, which includes all

    samples using the optimal number of clusters determined so far and the same choice of

    cluster representatives adopted in the first step. A fixed number of iterations is run to

    obtain a final refinement of the clustering.

  • 8/7/2019 Web user session

    11/24

    LANGAUGE SPECIFICATION

    4.1 FEATURES OF. NET

    Microsoft .NET is a set of Microsoft software

    technologies for rapidly building and integrating XML Web

    services, Microsoft Windows-based applications, and Web

    solutions. The .NET Framework is a language-neutral platform for

    writing programs that can easily and securely interoperate.

    Theres no language barrier with .NET: there are numerous

    languages available to the developer including Managed C++,

    C#, Visual Basic and Java Script. The .NET framework provides the

    foundation for components to interact seamlessly, whether locally

    or remotely on different platforms. It standardizes common data

    types and communications protocols so that components created

    in different languages can easily interoperate.

    .NET is also the collective name given to various

    software components built upon the .NET platform. These will be

    both products (Visual Studio.NET and Windows.NET Server, for

  • 8/7/2019 Web user session

    12/24

    instance) and services (like Passport, .NET My Services, and so

    on).

    THE .NET FRAMEWORK

    The .NET Framework has two main parts:

    1. The Common Language Runtime (CLR).

    2. A hierarchical set of class libraries.

    The CLR is described as the execution engine of .NET. It

    provides the environment within which programs run. The most

    important features are

  • 8/7/2019 Web user session

    13/24

    Conversion from a low-level assembler-style language,

    called Intermediate Language (IL), into code native to

    the platform being executed on.

    Memory management, notably including garbage

    collection.

    Checking and enforcing security restrictions on the

    running code.

    Loading and executing programs, with version control

    and other such features.

    The following features of the .NET framework are also

    worth description:

    Managed Code

    The code that targets .NET, and which contains certain

    extra

    Information - metadata - to describe itself. Whilst both managed

    and unmanaged code can run in the runtime, only managed code

    contains the information that allows the CLR to guarantee, for

    instance, safe execution and interoperability.

  • 8/7/2019 Web user session

    14/24

    Managed Data

    With Managed Code comes Managed Data. CLR

    provides memory allocation and Deal location facilities, and

    garbage collection. Some .NET languages use Managed Data by

    default, such as C#, Visual Basic.NET and JScript.NET, whereas

    others, namely C++, do not. Targeting CLR can, depending on the

    language youre using, impose certain constraints on the features

    available. As with managed and unmanaged code, one can have

    both managed and unmanaged data in .NET applications - data

    that doesnt get garbage collected but instead is looked after by

    unmanaged code.

    Common Type System

    The CLR uses something called the Common Type System

    (CTS) to strictly enforce type-safety. This ensures that all classes

    are compatible with each other, by describing types in a common

  • 8/7/2019 Web user session

    15/24

    way. CTS define how types work within the runtime, which

    enables types in one language to interoperate with types in

    another language, including cross-language exception handling.

    As well as ensuring that types are only used in appropriate ways,

    the runtime also ensures that code doesnt attempt to access

    memory that hasnt been allocated to it.

    Common Language Specification

    The CLR provides built-in support for language

    interoperability. To ensure that you can develop managed code

    that can be fully used by developers using any programming

    language, a set of language features and rules for using them

    called the Common Language Specification (CLS) has been

    defined. Components that follow these rules and expose only CLS

    features are considered CLS-compliant.

  • 8/7/2019 Web user session

    16/24

    THE CLASS LIBRARY

    .NET provides a single-rooted hierarchy of classes,

    containing over 7000 types. The root of the namespace is called

    System; this contains basic types like Byte, Double, Boolean, and

    String, as well as Object. All objects derive from System. Object.

    As well as objects, there are value types. Value types can be

    allocated on the stack, which can provide useful flexibility. There

    are also efficient means of converting value types to object types

    if and when necessary.

    The set of classes is pretty comprehensive, providing

    collections, file, screen, and network I/O, threading, and so on, as

    well as XML and database connectivity.

    The class library is subdivided into a number of sets (or

    namespaces), each providing distinct areas of functionality, with

    dependencies between the namespaces kept to a minimum.

  • 8/7/2019 Web user session

    17/24

    LANGUAGES SUPPORTED BY .NET

    The multi-language capability of the .NET Framework

    and Visual Studio .NET enables developers to use their existing

    programming skills to build all types of applications and XML Web

    services. The .NET framework supports new versions of

    Microsofts old favorites Visual Basic and C++ (as VB.NET and

    Managed C++), but there are also a number of new additions to

    the family.

    Visual Basic .NET has been updated to include many

    new and improved language features that make it a powerful

    object-oriented programming language. These features include

    inheritance, interfaces, and overloading, among others. Visual

    Basic also now supports structured exception handling, custom

    attributes and also supports multi-threading.

  • 8/7/2019 Web user session

    18/24

    Visual Basic .NET is also CLS compliant, which means

    that any CLS-compliant language can use the classes, objects,

    and components you create in Visual Basic .NET.

    Managed Extensions for C++ and attributed

    programming are just some of the enhancements made to the C+

    + language. Managed Extensions simplify the task of migrating

    existing C++ applications to the new .NET Framework.

    C# is Microsofts new language. Its a C-style language

    that is essentially C++ for Rapid Application Development.

    Unlike other languages, its specification is just the grammar of

    the language. It has no standard library of its own, and instead

    has been designed with the intention of using the .NET libraries as

    its own.

    Microsoft Visual J# .NET provides the easiest transition

    for Java-language developers into the world of XML Web Services

    and dramatically improves the interoperability of Java-language

  • 8/7/2019 Web user session

    19/24

    programs with existing software written in a variety of other

    programming languages.

    Active State has created Visual Perl and Visual Python,

    which enable .NET-aware applications to be built in either Perl or

    Python. Both products can be integrated into the Visual Studio

    .NET environment. Visual Perl includes support for Active States

    Perl Dev Kit.

    Other languages for which .NET compilers are available include

    FORTRAN

    COBOL

  • 8/7/2019 Web user session

    20/24

    Eiffel

    Fig1 .Net Framework

    ASP.NET

    XML WEB

    SERVICES

    Windows

    Forms

    Base Class Libraries

    Common Language Runtime

    Operating System

    4.2 FEATURES OF C#. NET

  • 8/7/2019 Web user session

    21/24

    C#.NET is also compliant with CLS (Common Language Specification) and

    supports structured exception handling. CLS is set of rules and constructs that

    are supported by the CLR (Common Language Runtime). CLR is the runtime

    environment provided by the .NET Framework; it manages the execution of the

    code and also makes the development process easier by providing services.

    C#.NET is a CLS-compliant language. Any objects, classes, or components that

    created in C#.NET can be used in any other CLS-compliant language. In

    addition, we can use objects, classes, and components created in other CLS-

    compliant languages in C#.NET .The use of CLS ensures complete

    interoperability among applications, regardless of the languages used to create

    the application.

    CONSTRUCTORS AND DESTRUCTORS:

    Constructors are used to initialize objects, whereas destructors are used to

    destroy them. In other words, destructors are used to release the resources

  • 8/7/2019 Web user session

    22/24

    allocated to the object. In C#.NET the sub finalize procedure is available. The

    sub finalize procedure is used to complete the tasks that must be performed

    when an object is destroyed. The sub finalize procedure is called automatically

    when an object is destroyed. In addition, the sub finalize procedure can be

    called only from the class it belongs to or from derived classes.

    GARBAGE COLLECTION

    Garbage Collection is another new feature in C#.NET. The .NET Framework

    monitors allocated resources, such as objects and variables. In addition, the

    .NET Framework automatically releases memory for reuse by destroying

    objects that are no longer in use.

    In C#.NET, the garbage collector checks for the objects that are not currently in

    use by applications. When the garbage collector comes across an object that is

    marked for garbage collection, it releases the memory occupied by the object.

    OVERLOADING

    Overloading is another feature in C#. Overloading enables us to define multiple

    procedures with the same name, where each procedure has a different set of

    arguments. Besides using overloading for procedures, we can use it for

    constructors and properties in a class.

  • 8/7/2019 Web user session

    23/24

    MULTITHREADING:

    C#.NET also supports multithreading. An application that supports

    multithreading can handle multiple tasks simultaneously, we can use

    multithreading to decrease the time taken by an application to respond to user

    interaction.

    STRUCTURED EXCEPTION HANDLING

    C#.NET supports structured handling, which enables us to detect and

    remove errors at runtime. In C#.NET, we need to use TryCatchFinally

    statements to create exception handlers. Using TryCatchFinally statements,

    we can create robust and effective exception handlers to improve the

    performance of our application.

    THE .NET FRAMEWORK

    The .NET Framework is a new computing platform that simplifies

    application development in the highly distributed environment of the Internet.

    OBJECTIVES OF. NET FRAMEWORK

  • 8/7/2019 Web user session

    24/24

    1. To provide a consistent object-oriented programming environment whether

    object codes is stored and executed locally on Internet-distributed, or executed

    remotely.

    2. To provide a code-execution environment to minimizes software deployment

    and guarantees safe execution of code.

    3. Eliminates the performance problems.

    There are different types of application, such as Windows-based applications

    and Web-based applications.


Recommended