+ All Categories
Home > Documents > softcopyvasu

softcopyvasu

Date post: 08-Apr-2018
Category:
Upload: manju-yanamala
View: 219 times
Download: 0 times
Share this document with a friend

of 82

Transcript
  • 8/6/2019 softcopyvasu

    1/85

    Website Fetcher

    1

    Abstract

    The On demand web speeding and fetching is a multithreaded windows application that

    downloads and stores Web pages Uniform Resource Identifier (URIs), for a Web search

    engine. Roughly, a crawler starts off by placing an initial set of URLs, so, in a queue,

    where all URLs to be retrieved are kept and prioritized. From this queue, the crawler gets

    a URL (in some order), downloads the page, extracts any URLs in the downloaded page,

    and puts the new URLs in the queue. This process is repeated until the crawler decides to

    stop. Collected pages are later used for other applications, such as a Web search engine or

    a Web cache.

    On demand Webspidering and Fetching aims to develop a user interface which brings

    the information about a particular given website. This is a multithreaded windows

    application that downloads and stores Uniform Resource Identifiers of typical website.

    This application has got its use as a backend processing component for a search engine.

    The results gathered by the Website Fetcher will be given to the indexer which indexes

    page data so that the search query gives the results faster. The proposed project once

    implemented can connect to the websites and download data which once indexed can be

    given to the search engine

    A crawler is a program that visits Web sites and reads their pages and other information

    in order to create entries for a search engine index. The major search engines on the Web

    all have such a program, which is also known as a "spider" or a "bot."

  • 8/6/2019 softcopyvasu

    2/85

    Website Fetcher

    2

    ACKNOWLEDGMENT

    Our express thanks and gratitude to Almighty God, our parents and friends without

    whose unsustained support, we could not have made this project in 2008-2011. We wish

    to place on our record and deep sense of gratitude to my principal Mr G. Sudhakar,

    H.O.D Mr. M.Sreenivas, project guide Mr. Brahma for his constant motivation and

    valuable help through the project work. Express our gratitude I Matrix technologies for

    his valuable suggestions and advices throughout the project. We also extend our thanks to

    other faculties for their cooperation during our course. Finally we would like to thanks.

    INDEX

  • 8/6/2019 softcopyvasu

    3/85

    Website Fetcher

    3

    S.NO TITLE PAGE NO

    1. Organization Profile 2

    2. Project Overview 6

    3. Aim & Scope of the Project 7

    4. Existing System 10

    5. Proposed System 11

    6. Software Requirements 12

    7. Hardware Requirements 12

    8. Feasible Study 13

    9. System Design Introduction 15

    10. Database Design 16

    11. Data Dictionary 18

    12. UML Diagrams 20

    13. Unit Testing 38

    14. Integration Testing 39

    15. System Testing 39

    16 System Implementation 40

    17. Screens 45

    18. Reports 60

    19. Technology Specification 62

    20. Conclusion 84

    21. Biography and References 86

  • 8/6/2019 softcopyvasu

    4/85

    Website Fetcher

    4

    INTRODUCTION

    ORGANIZATION PROFILEIMATRIX TECHNOLOGIES

  • 8/6/2019 softcopyvasu

    5/85

    Website Fetcher

    5

    iMatrix is an innovative organization having

    a unique blend of industry and academia. We would

    like to introduce ourselves as a team of fully qualified

    professionals in the field of IT education. We are

    dedicated to the development of industry driven academic programs that bridge the gap

    between students and employers, benefiting both.

    We provide all-inclusive and cost-effective training for students looking to

    take the first step towards new careers. Effective learning happens when the student is

    stimulated and challenged to go beyond their current frame of thinking and create a new

    reality about the subject. We at iMatrix always strive for the best. The curriculum and

    course material has been prepared after an intensive research and consultation with the

    industry and academia so that it can be easily grasped by the students of all levels.

    We are in no doubt that all of the training we deliver is at a incredible standard

    and are constantly striving to advance and become even better. All our training modules

    are well designed, well equipped, fit for purpose and delivered by trainers who are

    motivational and inspirational, trainers who can make learning interesting and fun. Every

    student has the freedom to discuss and learn. We very much understand the Industry as

    well as the students current requirement. A student can have the industry experience

    while studying in the Institute only.

    Although we have best resources to cater best results, we believe that it is not only money

    or manpower but a team effort which can make any organization to reach new heights.

    Our team provides a highly conducive and congenial environment to facilitate the process

    of learning. We have only Experienced Working Professionals as our faculties so that

    they can share their expertise with the students. The faculties keep on updating

  • 8/6/2019 softcopyvasu

    6/85

    Website Fetcher

    6

    themselves. Because they have already solved the same problems the students are likely

    to encounter, our faculties are equipped to share their real-world experience and practical

    solutions with the students. Everything the students learn is practical, relevant, and up to

    date.

    Why Us?

    We have certified instructor-led training.

    We provide you unlimited access to our labs to practice what you've learned.

    We provide you with test preparation tools to assist you in preparing for your exams.

    We assist you in preparing your resume and entering the workforce.

    We provide books, reference materials, and guide you through the learning process.

    At iMatrix technologies, we are providing Live Project Training to B.E

    Computers, B.E IT (Information Technology), B.Tech, MBA, MCA, M.Tech students.

    Our project training is based Live Projects from customers across the globe. The projects

    will be on various platforms such as VB.net/

    ASP.net/ VC++.net/ C#.net/Java/ JSP/ Servlets/ J2EE / J2ME Technologies.

    We provide the students with the adequate infrastructure so that they

    can practice and further enhance their skills. Practicing on their own would provide them

    with the necessary confidence.

    That is required to meet the challenging demands of the job world. We prepare students

    for Real World Software Industry. The students become industry ready professionals after

    completion of our Live

    Project Training program.

    Our focus or intent is not only to teach a language but also to gain practical knowledge.

    This training programming we will help the students gain some practical live experience

    thereby strengthening knowledge they had gained during their graduation. This practical

  • 8/6/2019 softcopyvasu

    7/85

    Website Fetcher

    7

    hands-on experience would help them fill the gap which prevents majority of fresher's to

    be considered by their prospective employers due to lack of Our training program

    basically insists upon developing the student's ability to think dynamically and logically.

    Live Project Training program enables candidates to have some hands on experience with

    the real world development process and through complete Software Development Life

    Cycle Our live project training program consists of two phases. First being the technology

    training and the second one is project execution.

    Phase 1 - Technology Training

    Our rigorous technology training enables the students to

    undertake in depth training, both theoretical and practical

    sessions. Training places a high value on sharing information

    and experience.

    Stress on the basics

    Interactive Sessions

    Comprehensive material

    Practice Tests

    Attention to every individual

    Phase 2 - Project Execution

    In this phase, the students implement the knowledge they have secured in phase one.

    They design, code, test and deploy the application.

    Challenging projects dealing with cutting edge technologies.

    All students are evaluated and mentored by people from IT

    Industry.

    All the industry best practices are explained and are

    implemented in the projects.

  • 8/6/2019 softcopyvasu

    8/85

    Website Fetcher

    8

    Latest tools are used to execute the projects.

    Weekly project review, attendance, workshops, final test and interview are used to assess

    the students.

    Certificates are awarded on successful completion of the project.

    Other Features

    Friendly faculty and a lively work atmosphere.

    Seminars from Industry professionals.

    Complete module is designed and executed under the guidance of IT professionals.

    Attendance is shared with the college on a fortnightly basis.

    Constant inputs will be taken from the respective HODs of the students.

    Project Overview

    The Website Fetcher is a multithreaded windows application that downloads and stores

    Web pages Uniform Resource Identifier (URIs), for a Web search engine. Roughly, a

    crawler starts off by placing an initial set of URLs, so, in a queue, where all URLs to be

    retrieved are kept and prioritized. From this queue, the crawler gets a URL (in some

    order), downloads the page, extracts any URLs in the downloaded page, and puts the new

    URLs in the queue. This process is repeated until the crawler decides to stop. Collected

    pages are later used for other applications, such as a Web search engine or a Web cache.

  • 8/6/2019 softcopyvasu

    9/85

    Website Fetcher

    9

    As the size of the Web grows, it becomes more difficult to retrieve the whole or a

    significant portion of the Web using a single process. Therefore, many search engines

    often run multiple processes in parallel to perform the above task, so that download rate is

    maximized. We refer to this type of fetcher as a parallel crawler. This type of applications

    are often used in search engines where there is a need of collecting all the URls based on

    a query and indexing them on priority.

    This application is a .Net based fetcher very similar to Googlebot, Googles crawler. This

    application has got its use as a backend processing component for a search engine. The

    results (URI data) gathered by the website fetcher will be given to an indexer which

    indexes page data so that the search query gives the results faster.

    Configurator module:

    Mime types:

    In this will set all kinds of data we need to extract from the particular URI like weather

    we need storing data, Boolean data and images information or not

    Output settings:

    In this will set all kinds of data we need to extract from the particular URI like weather

    we need storing data, Boolean data and images information or not

    Advanced settings:

    These are the settings made by the user in order to restrict some kind of website

    like with domain name as .NET,.AC.IN like this.

  • 8/6/2019 softcopyvasu

    10/85

    Website Fetcher

    10

    Multithreaded downloader:

    Here the multi threaded downloader is responsible for starting threads and

    obtaining the information about the website being fetched. So the multi threaded

    downloader starts threads and it pushes all URIs one queue. Each and every thread is

    starts with one Uri in the queue. After completion it just jumps to the next URIs in the

    queue. In this module one folder creates in the user desired path and the files created with

    the URI names having the static information

    Aim and Scope of the project:

    On demand Webspidering and Fetching aims to develop a user interface which

    brings the information about a particular given website. This is a multithreaded

    windows application that downloads and stores Uniform Resource Identifiers of

    typical website. This application has got its use as a backend processing component

    for a search engine. The results gathered by the Website Fetcher will be given to the

    indexer which indexes page data so that the search query gives the results faster. The

    proposed project once implemented can connect to the websites and download data

    which once indexed can be given to the search engine.

    The aim of the Action Plan-BS Project has been the preparation of an Action Plan inScience and Technology for the Black Sea countries and its adoption at the level of

    relevant Ministers. To achieve its goal the project included preparatory work on a

    draft Action Plan and the organization of two events: a High Level Officials meeting

    and a Ministerial Meeting

  • 8/6/2019 softcopyvasu

    11/85

    Website Fetcher

    11

    Project scope

    When we observe many search engines the first question arises in our mind is how

    they will be able to display the information in the form of many links. The typical answer

    for this question is the search engine obviously has the information about the differentwebsites and corresponding links called URIs in their database. The immediate and

    interesting question again is how the search is getting the information about all the sites.

    So taking this point into consideration we are going to develop a user interface which

    brings the information about a particular website given. If we know how we are getting

    information about a particular website then it is very simple to implement this for all

    websites.

    SYSTEM ANALYSIS

  • 8/6/2019 softcopyvasu

    12/85

    Website Fetcher

    12

    SOFTWARE REQUIREMENTS SPECIFICATION

    Problem Definition:

    The On demand web spidering and Fetching is a multithreaded

    windows application that downloads and stores Web pages Uniform Resource

    Identifier (URIs), for a Web search engine. Roughly, a crawler starts off by placing

    an initial set of URLs, so, in a queue, where all URLs to be retrieved are kept and

    prioritized. From this queue, the crawler gets a URL (in some order), downloads the

    page, extracts any URLs in the downloaded page, and puts the new URLs in the

    queue. This process is repeated until the crawler decides to stop. Collected pages are

    later used for other applications, such as a Web search engine or a Web cache.

    Aim and Scope of the project:

    On demand Webspidering and Fetching aims to develop a user interface which

    brings the information about a particular given website. This is a multithreaded

  • 8/6/2019 softcopyvasu

    13/85

    Website Fetcher

    13

    windows application that downloads and stores Uniform Resource Identifiers of

    typical website. This application has got its use as a backend processing component

    for a search engine. The results gathered by the Website Fetcher will be given to the

    indexer which indexes page data so that the search query gives the results faster. The

    proposed project once implemented can connect to the websites and download data

    which once indexed can be given to the search engine.

    Why the particular project is chosen:

    With the increasing growth in the number of industries and the competitive run for

    the trophy of perfection and profit, these industries round the globe are in the lookout for

    a roof to exhibit their products in the most convenient ,profitable and successful means.

    It is a web based application. Here we are using Asp.Net with C# language. The

    .NET Framework is a new computing platform that simplifies application development in

    the highly distributed environment of the Internet. The .NET Framework is designed to

    fulfill the objectives.

    This is a web-based application. It avoids all the management service issues which

    we are faced in the existing system. The Present System is having the advantages like fast

    accessing of the data, Best Management Service and good Security Mechanism. In this

    System the limitations are applicable.

    Reasons for choosing Front-End and Back-End:

    It is a web based application. Here we are using Asp.Net with C# language. The

    .NET Framework is a new computing platform that simplifies application development in

    the highly distributed environment of the Internet. The .NET Framework is designed to

    fulfill the following objectives:

    To provide a consistent object-oriented programming environment whether

    object code is stored and executed locally, executed locally but Internet-

    distributed, or executed remotely.

    To provide a code-execution environment that minimizes software deployment

    and versioning conflicts.

  • 8/6/2019 softcopyvasu

    14/85

    Website Fetcher

    14

    To provide a code-execution environment that guarantees safe execution of

    code, including code created by an unknown or semi-trusted third party.

    To provide a code-execution environment that eliminates the performance

    problems of scripted or interpreted environments.

    To make the developer experience consistent across widely varying types of

    applications, such as Windows-based applications and Web-based

    applications.

    To build all communication on industry standards to ensure that code based on

    the .NET Framework can integrate with any other code.

    The .NET Framework can be hosted by unmanaged components that load the

    common language runtime into their processes and initiate the execution of

    managed code, thereby creating a software environment that can exploit both

    managed and unmanaged features.

    Existing System:

    As the size of the web grows, it becomes more difficult to retrieve the whole or

    significant portion of the web information using a single process.

    Downloading rate is minimized and downloading time is maximized due theabove reason.

    Storage of static pages is not usually seen in any of the search engines.

    Error pages when encountered are not stored separately.

  • 8/6/2019 softcopyvasu

    15/85

    Website Fetcher

    15

    Proposed System:

    When we observe many search engines the first question that arises in our mind is

    how the information is displayed in the form of many links.

    The typical answer is, the search engine obviously has the information about

    different websites and corresponding links called URIs in its database.

    The immediate and interesting question is how the search is getting the

    information about all the sites.

    Taking this point into consideration we are going to develop a user interface

    which brings the information about a particular given website.

    If we know how we are getting information about a particular website then it is

    very simple to implement this for all websites.

    Multiple processes are run in parallel to perform the above task, so that download

    rate is maximized and downloading time is minimized.

  • 8/6/2019 softcopyvasu

    16/85

    Website Fetcher

    16

    We refer to this type of fetcher as a parallel crawler

    Static pages are stored in user desired folder.

    Any difficulties encountered can be viewed separately in the Errors view.

    SOFTWARE & HARDWARE SPECIFICATIONS

    SOFTWARE SPECIFICATIONS:

    Microsoft .net framework

    Microsoft C# .net language

    Microsoft Windows 2000

    Microsoft Visual Studio 2005

    Sql Server 2005

    HARDWARE SPECIFICATIONS:

    PROCESSOR: P4

    RAM: 2 MB

    HARD DISK : 60 GB

    OPERATING SYSTEM:Windows 2000 or higher

  • 8/6/2019 softcopyvasu

    17/85

    Website Fetcher

    17

    Feasibility Study

    Feasibility Studyis a high level capsule version of the entire process intended to answer a

    number of questions like: What is the problem? Is there any feasible solution to the given

    problem? Is the problem even worth solving? Feasibility study is conducted once the

    problem clearly understood. Feasibility study is necessary to determine that the proposed

    system is Feasible by considering the technical, Operational, and Economical factors. By

    having a detailed feasibility study the management will have a clear-cut view of the

    proposed system.

    The following feasibilities are considered for the project in order to ensure that the

    project is variable and it does not have any major obstructions. Feasibility study

    encompasses the following things:

    Technical Feasibility

    Economical Feasibility

    Operational Feasibility

  • 8/6/2019 softcopyvasu

    18/85

    Website Fetcher

    18

    Technical Feasibility

    In this step, we verify whether the proposed systems are technically feasible or not. i.e.,

    all the technologies required to develop the system are available readily or not.

    Technical Feasibility determines whether the organization has the technology and skills

    necessary to carry out the project and how this should be obtained. The system can be

    feasible because of the following grounds.

    All necessary technology exists to develop the system.

    This system is too flexible and it can be expanded further.

    Economical Feasibility

    In this step, we verify which proposal is more economical. We compare the financial

    benefits of the new system with the investment. The new system is economically feasible

    only when the financial benefits are more than the investments and expenditure.

    Economical Feasibility determines whether the project goal can be within the resource

    limits allocated to it or not. It must determine whether it is worthwhile to process with the

    entire project or whether the benefits obtained from the new system are not worth the

    costs. Financial benefits must be equal or exceed the costs. In this issue, we should

    consider:

    The cost to conduct a full system investigation.

    The cost of h/w and s/w for the class of application being considered.

    The development tool.

    The cost of maintenance etc.,

  • 8/6/2019 softcopyvasu

    19/85

    Website Fetcher

    19

    OPERATIONAL FEASIBILITY

    Proposed projects are beneficial only if they can be turned into information systems that

    will meet the organizations operating requirements. Simply stated, this test of feasibility

    asks if the system will work when it is developed and installed. Are there major barriers

    to Implementation? Here are questions that will help test the operational feasibility of a

    project:

    Is there sufficient support for the project from management from users? If the current

    system is well liked and used to the extent that persons will not be able to see reasons for

    change, there may be resistance.

    Are the current business methods acceptable to the user? If they are not, Users may

    welcome a change that will bring about a more operational and useful systems.Have the

    user been involved in the planning and development of the project? Early involvement

    reduces the chances of resistance to the system and in General and increases the

    likelihood of successful project.Since the proposed system was to help reduce the

    hardships encountered In the existing manual system, the new system was considered to

    be operational feasible.

  • 8/6/2019 softcopyvasu

    20/85

    Website Fetcher

    20

    SYSTEM DESIGN

  • 8/6/2019 softcopyvasu

    21/85

    Website Fetcher

    21

    DATABASE DESIGN

    The data pertaining to proposed system is voluminous that a careful design of the

    database must proceed before storing the data in the database.

    A database management system provides flexibility in the storage and retrieval of data

    bad production of information. The DBMS is a bridge between the application

    program, which determines what data are needed and how they are processed, and the

    operating system of the computer, which is responsible for placing data on the

    magnetic storage devices. A schema defines the database and a subschema defines

    the portion of the database that a specific program will use.

    TYPES OF DATABASE DESIGN

    CONCEPTUAL SCHEMA

    Once a database designer is aware of the data which is to be stored within the database,

    they must then determine where dependency is within the data. Sometimes when data.

  • 8/6/2019 softcopyvasu

    22/85

    Website Fetcher

    22

    is changed you can be changing other data that is not visible. For example, in a list of

    names and addresses, assuming a situation where multiple people can have the same

    address, but one person cannot have more than one addresses, the name is dependent

    upon the address, because if the address is different than the associated name is different

    too. However, the other way around is different. One attribute can change and not

    another.

    LOGICALLY STRUCTURING DATA

    Once the relationships and dependencies amongst the various pieces of information have

    been determined, it is possible to arrange the data into a logical structure which can then

    be mapped into the storage objects supported by the database management system. In the

    case of relational databases the storage objects are tables which store data in rows and

    columns. Each table may represent an implementation of either a logical object or a

    relationship joining one or more instances of one or more logical objects. Relationships

    between tables may then be stored as links connecting child tables with parents. Since

    complex logical relationships are themselves tables they will probably have links to

    more than one parent. In an Object database the storage objects correspond directly to

    the objects used by the Object-oriented programming language used to write the

    applications that will manage and access the data. The relationships may be defined as

    attributes of the object classes involved or as methods that operate on the object classes.

    PHYSICAL DATABASE DESIGN

    The physical design of the database specifies the physical configuration of the database

    on the storage media. This includes detailed specification of data elements, data types,

    indexing options and other parameters residing in the DBMS data dictionary. It is the

  • 8/6/2019 softcopyvasu

    23/85

    Website Fetcher

    23

    detailed design of a system that includes modules & the database's hardware & software

    specifications of the system.

    THE DESIGN PROCESS

    The design process consists of the following steps:

    Determine the purpose of your database - This helps prepare you for the

    remaining steps.

    Find and organize the information required - Gather all of the types of

    information you might want to record in the database, such as product

    name and order number.

    Divide the information into tables - Divide your information itemsmajor

    entities or subjects, such as Products or Orders. Each subject then

    becomes a table.

    Turn information items into columns - Decide what information you

    want to store in each table. Each item becomes a field, and is

    displayed as a column in the table. For example, an Employees table

    might include fields such as Last Name and Hire Date.

    Specify primary keys - Choose each tables primary key. The primary

    key is a column that is used to uniquely identify each row. An

    example might be Product ID or Order ID.

    Set up the table relationships - Look at each table and decide how the

    data in one table is related to the data in other tables. Add fields to

    tables or create new tables to clarify the relationships, as necessary.

  • 8/6/2019 softcopyvasu

    24/85

    Website Fetcher

    24

    Refine your design - Analyze your design for errors. Create the tables and

    add a few records of sample data. See if you can get the results you

    want from your tables. Make adjustments to the design, as needed.

    Apply the normalization rules - Apply the data normalization rules to

    see if your tables are structured correctly. Make adjustments to the

    tables, as needed.

    DETERMINING DATA TO BE STORED

    In a majority of cases, a person who is doing the design of a database is a person with

    expertise in the area of database design, rather than expertise in the domain from which

    the data to be stored is drawn e.g. financial information, biological information etc.

    Therefore the data to be stored in the database must be determined in cooperation with a

    person who does have expertise in that domain, and who is aware of what data must

    be stored within the system.This process is one which is generally considered part of

    requirements analysis, and requires skill on the part of the database designer to elicit the

    needed information from those with the domain knowledge. This is because those with

    the necessary domain knowledge frequently cannot express clearly what their system

    requirements for the database are as they are unaccustomed to thinking in terms of the

    discrete data elements which must be stored. Data to be stored can be determined by

    Requirement Specification.

    Data Model

  • 8/6/2019 softcopyvasu

    25/85

    Website Fetcher

    25

    The organization of the data is represented by a data model and identifies the logical

    organization of data. In a model of real world similar things are usually grouped into

    classes of object called object types.

    A data model is a pattern according to which data are logically organized. It consists

    of the named logical units of data and expresses the relationships among the data as

    determined by the interpretation of the model of real world.

    In the relational data model all attribute relationships and all associations are

    represented as relations. There is no distinction even at the model level, between the

    different kinds or relations. Syntactically all the relations are the same. The data

    model does not produce the introduction of additional semantic information to

    distinguish different relations according to their properties of models namely:

    Network Model

    Hierarchical data model

    Relational Data model

    Relational data Model

    The relational data model is a formal model for representing relationships

    among attributes of an entity set and the association between entity sets.

    In the relational data model all attribute relationships and all associations

    are represented as relations. There is no distinction even at the model level, between

    the

    Different kinds or relations. Syntactically all the relations are the same. The data

    model does not produce the introduction of additional semantic information to

    distinguish different relations according to their properties.

  • 8/6/2019 softcopyvasu

    26/85

    Website Fetcher

    26

    NORMALIZATION

    Normalization theory is built around the concept of normal forms. A

    relation is said to be in particular normal form if it satisfies a certain specified set of

    constraints.

    FIRST NORMALFORM

    A relation R is in first normal form if and only if all underlying domains

    contained atomic values only.

    SECOND NORMALFORM

    A relation R is said to be in second normal form if and only if it is in first

    normal form and every non-key attribute is fully dependent on the primary key.

    THIRD NORMALFORM

    A relation R is said to be in third normal form if and only if it is in second normal

    form and every non key attribute is non transitively depend on the p

  • 8/6/2019 softcopyvasu

    27/85

    Website Fetcher

    27

  • 8/6/2019 softcopyvasu

    28/85

    Website Fetcher

    28

    Data Dictionary

    Thread count:

    S.NO COLUMNNAME ALLOWNULLS

    id int yes

    Website log:

    Coloumn name Data type Allow nulls

    Websiteid _url_id int No

    Website id Varchar(150) yes

    url name Varchar(500) yes

    Website url:

    Coloumn name Data type Allow nulls

    Websiteid_url_id Int no

    Website id Varchar(150) Yes

    url name Varchar(500) yes

  • 8/6/2019 softcopyvasu

    29/85

    Website Fetcher

    29

    UML DIAGRAM

    Use case Diagrams:

    The main purpose of a use case diagram is to show what system functions are

    performed for which actors. Roles of the actors in the system can be depicted.

    It mainly specifies behavior of the System

    Use case diagrams depict:

    Use cases. A use case describes a sequence of actions that provide

    something of measurable value to an actor and is drawn as a horizontal

    ellipse.

    Actors. An actor is a person, organization, or external system that plays a

    role in one or more interactions with your system. Actors are drawn as

    stick figures.

    Associations. Associations between actors and use cases are indicated in

    use case diagrams by solid lines. An association exists whenever an actor

    is involved with an interaction described by a use case. Associations are

    modeled as lines connecting use cases and actors to one another, with an

    optional arrowhead on one end of the line. The arrowhead is often used to

    indicating the direction of the initial invocation of the relationship or to

    indicate the primary actor within the use case. The arrowheads are

    typically confused with data flow and as a result I avoid their use.

    Class Diagrams

    a class diagram in the Unified Modeling Language (UML), is a type of static

    structure diagram that describes the structure of a system by showing the system's

    classes, their attributes, and the relationships between the classes.

    Relationships:

    A relationship is a general term covering the specific types of logical connections

    found on class and objects diagrams. UML shows the following relationships:

  • 8/6/2019 softcopyvasu

    30/85

    Website Fetcher

    30

    External links

    A Link is the basic relationship among objects. It is represented as a line

    connecting two or more object boxes. It can be shown on an object diagram or class

    diagram. A link is an instance of an association.

    Association

    Class diagram example of association between two classes

    An Association represents a family of links. Binary associations (with two ends)

    are normally represented as a line, with each end connected to a class box. Higher order

    associations can be drawn with more than two ends. In such cases, the ends are connected

    to a central diamond.

    An association can be named, and the ends of an association can be adorned with

    role names, ownership indicators, multiplicity, visibility, and other properties. There are

    five different types of association. Bi-directional and uni-directional associations are the

    most common ones. For instance, a flight class is associated with a plane class bi-

    directionally. Associations can only be shown on class diagrams.

    Example: "department offers courses", is an association relationship.

    Aggregation

    Class diagram showing Aggregation between two classes

    Aggregation is a variant of the "has a" or association relationship; aggregation is

    more specific than association. It is an association that represents a part-whole

    relationship. As a type of association, an aggregation can be named and have the same

    adornments that an association can. However, an aggregation may not involve more than

    two classes.

    Composition

    Class diagram showing Composition between two classes at top and Aggregation

    between two classes at bottom

    Composition is a stronger variant of the "has a" or association relationship;

    composition is more specific than aggregation. It is represented with a solid diamond

    shape.

  • 8/6/2019 softcopyvasu

    31/85

    Website Fetcher

    31

    The UML graphical representation of a composition relationship is a filled

    diamond shape on the containing class end of the tree of lines that connect contained

    class(es) to the containing class.

    Description:

    It is class diagram it contains names,propertiesand methods.website

    fetcher contains get URLs(),set URLs(),connect(),crawl().The crawl

    class contains methods are parseURIs(),runthread(),

    Crawler

    ParseURI's()

    runthread()

    Settings

    MIMEtype()

    outputpath()

    WebsiteFetcher

    get URL's()

    set URL's()

    connect()

    crawl()

    Queue

    add()

    save()

    open()

  • 8/6/2019 softcopyvasu

    32/85

    Website Fetcher

    32

    USECASE DIAGRAMS

  • 8/6/2019 softcopyvasu

    33/85

    Website Fetcher

    33

    USE CASE DIAGRAM:

    Description:

    In above diagram is a use case diagram. In this diagram the actor has to be perform

    certain actions those are connecting www and seond action is getting urls and next

    action isclassifying threads,urls errors and next ation is storing static pages .

    COFIGURATOR MODULE:

    Connecting www

    URL Reading getting URLs

    Classifying Threads,URLs,Errors

    Storing static pages

    Admin

    Settings

  • 8/6/2019 softcopyvasu

    34/85

    Website Fet er

    34

    Descri i

    In above diagram is a use case diagram. In t is diagram t e actor has to be perform

    certain actions those are mime t pe and this consist select data andnext action is output

    setting it will specify the path of output folder.next action is advancd setting it consists

    restrict data.

    CRAWLER VIEW:

  • 8/6/2019 softcopyvasu

    35/85

    Website Fetcher

    35

    Description:

    In above diagram is a use case diagram. In this diagram the actor has to be perform

    certain actions those are first action is the thread view and next action is the request view

    and last action it will perform error view.

    MULTITHREADED DOWNLOADER:

  • 8/6/2019 softcopyvasu

    36/85

    Website Fetcher

    36

    Description:

    In above diagram is a use case diagram. In this diagram the actor has to be perform

    certain actions those are first action is the start threads and next action is the queueand in

    this download theinformation and next action is the store in folder.

    SEQUENCE DIAGRAM

  • 8/6/2019 softcopyvasu

    37/85

    Website Fetcher

    37

    A sequence diagram in Unified Modelling Language (UML) is a kind of

    interaction diagram that shows how processes operate with one another and in what

    order. It is a construct of Message a Sequence Chart.

    Sequence diagrams are sometimes called Event-trace diagrams, event scenarios,

    and timing diagrams.

    A sequence diagram shows, as parallel vertical lines ("lifelines"), different

    processes or objects that live simultaneously, and, as horizontal arrows, the messages

    exchanged between them, in the order in which they occur. This allows the specification

    of simple runtime scenarios in a graphical manner.

    Collaboration Diagram:

    Admin Website Threads Errors view URLs Output path Static pages Settings

    connect

    configure

    Errors

    start

    get

    store

    specify

    perform

  • 8/6/2019 softcopyvasu

    38/85

    Website Fetcher

    38

  • 8/6/2019 softcopyvasu

    39/85

    Website Fetcher

    39

    SYSTEM TESTING

    TESTING

  • 8/6/2019 softcopyvasu

    40/85

    Website Fetcher

    40

    Software Testing and Quality Assurance:

    Software Testing:Software testing is critical element of software quality assuranceand represented the ultimate of review of specifications, design and coding. If testing is

    conducted successfully, it will uncover errors in the software. As a secondary benefit,

    testing demonstrate that software function appear to be working according to the

    specifications that performance requirement appear to have been meet.

    Why should we do testing?

    Error free superior product

    Quality assurance to the client

    Black testingtakes an external perspective of the test object to derive test cases. These

    tests can be functional or non-functional, though usually functional. The test designer

    selects valid and invalid inputs and determines the correct output. There is no knowledge

    of the test object's internal structure.

    This method of test design is applicable to all levels of software testing: unit,

    integration, functional testing, system and acceptance. The higher the level, and hence the

    bigger and more complex the box, the more one is forced to use black box testing to

    simplify. While this method can uncover unimplemented parts of the specification, one

    cannot be sure that all existent paths are tested.

    System Testing:

    As more critical function in business organizations activity are automated, more and more

    trust is being is placed in automated systems. This realization puts an ever increasing

    burden on system annalist to ensure that quality of systems. Depend on this design,

    development, testing and implementation and weakness in any areas will seriously

    jeopardize the quality.

    Testing:

  • 8/6/2019 softcopyvasu

    41/85

    Website Fetcher

    41

    No system is ever perfect, communication problems, programs negligible or more

    constraints create more errors that must be eliminated before delivering system to the

    users. A system is tested for on line response, value of transactions, stress recovery from

    failure and durability.

    Testing strategy:

    Considering from procedure point of view, testing is actually a series of three steps that

    are implemented sequentially initially, test of each module is conducted individually,

    ensuring that is functions properly as a unit. Hence, it is called UNIT TESTING next;

    module must be assemble or integrated as a set of higher order test is conducted.

    Unit Testing:

    Unit testing mainly focuses verification effort on the smallest unit of software design

    module. Using the design document, important control part are testing to uncover errors

    within the boundary of the module.

    Basically unit testing is of two types names white box testing or black box testing. This

    structure testing is also referred to as white Box or Glass Box testing.

    Integrated Testing:

    Integration testing is a systematic technique for constructing the program structure, while

    as the same time conducting test to uncover the errors associated with the interfacing. The

    objective is to take unit tested modules and build the program structure that has been

    dictated by the design.

    Top-Down Integration:

  • 8/6/2019 softcopyvasu

    42/85

    Website Fetcher

    42

    Top-Down integration is an incremental; approach in the construction of program

    structure. Modules are integrated by moving download through the control hierarchical

    beginning with the main control module to the subordinate modules.

    Bottom-Up Integration:

    Bottom-Up integration testing, as it names implies, begins construction and testing with

    atomic modules and moving towards the higher modules. The type of integration is

    depending on the application under development.

    Information systems follow stop down integration mechanism moving from one level to

    another level in a hierarchical format.

    6.4) Validation Testing:

    This test mainly concentrates on the software requirements specifications, a document

    that describe user-visible attribute of the software. According to I.S the validation testing

    refers to the weather the end user entered is an authorized user or not, if not he is not

    allowed for login on. While submitting the requirement it is taken care such that the user

    will not leave requirement null. The echo, he enters should not be blank.

  • 8/6/2019 softcopyvasu

    43/85

    Website Fetcher

    43

    Test Case:

    Test case 1: Valid Input Priority (H, L): High

    Test Objective: For Getting Accurate Output

    Test Description: User need to enter a valid URL or URI in the address barprovided in our application.

    Requirements Verified: Yes

    Test Environment: .Net IDE must be in running state, Threads should be

    invoked as soon we click on Go button.

    Test Setup/Pre-Conditions: .Net IDE should be in running state. Address

    must be entered.

    Actions Expected Results

    The user will select Go button to get allthe static content present in that URL/

    URI

    Service Found, Download filesand store in our local desktop or an

    error message should be pop up

    Pass: Conditions pass: No Fail: No

    Problems / Issues:NIL

    Notes: Successfully Executed

  • 8/6/2019 softcopyvasu

    44/85

    Website Fetcher

    44

    SYSTEM

    IMPLEMENTATION

    Design pattern Implementation

    Implementing the Singleton Pattern in C#

    The singleton pattern is one of the best-known patterns in software engineering.

    Essentially, a singleton is a class which only allows a single instance of itself to be

    created, and usually gives simple access to that instance. Most commonly, singletons

    don't allow any parameters to be specified when creating the instance - as otherwise a

    second request for an instance but with a different parameter could be problematic! (If the

    same instance should be accessed for all requests with the same parameter, the factory

    pattern is more appropriate.) This article deals only with the situation where no

    parameters are required. Typically a requirement of singletons is that they are created

    lazily - i.e. that the instance isn't created until it is first needed.

    There are various different ways of implementing the singleton pattern in C#. I shall

    present them here in reverse order of elegance, starting with the most commonly seen,

    which is not thread-safe, and working up to a fully lazily-loaded, thread-safe, simple and

  • 8/6/2019 softcopyvasu

    45/85

    Website Fetcher

    45

    highly performant version. Note that in the code here, I omit the private modifier, as it is

    the default for class members. In many other languages such as Java, there is a different

    default, and private should be used.

    All these implementations share four common characteristics, however:

    A single constructor, which is private and parameterless. This prevents other

    classes from instantiating it (which would be a violation of the pattern). Note that

    it also prevents sub classing - if a singleton can be sub classed once, it can be sub

    classed twice, and if each of those subclasses can create an instance, the pattern is

    violated. The factory pattern can be used if you need a single instance of a base

    type, but the exact type isn't known until runtime.

    The class is sealed. This is unnecessary, strictly speaking, due to the above point,

    but may help the JIT to optimize things more.

    A static variable which holds a reference to the single created instance, if any.

    A public static means of getting the reference to the single created instance,

    creating one if necessary.

    Note that all of these implementations also use a public static property Instance as the

    means of accessing the instance. In all cases, the property could easily be converted to a

    method, with no impact on thread-safety or performance.

    This is sample

    // Bad code! Do not use!

    public sealed class Singleton

    {

    static Singleton instance=null;

    Singleton()

    {}

    public static Singleton Instance

    {

    get{if (instance==null)

    {

    instance = new Singleton();

    }

    return instance;

    }

    }

  • 8/6/2019 softcopyvasu

    46/85

    Website Fetcher

    46

    }

    Pseudo code

    PrivatevoidlistViewRequests_SelectedIndexChanged(objectsender,

    System.EventArgs e)

    {

    if (this.listViewRequests.SelectedItems.Count == 0)

    return;

    ListViewItem item =

    this. list View Requests. SelectedItems[0];

    if (item.SubItems.Count > 2)

    this.textBoxRequest.Text = item.SubItems[2].Text;

    }

    AllMIMETypes = Settings.GetValue("Allow all MIME types", true);

    // MIME types are the types that are supported to be downloaded by the crawler

    // and the crawler includes a default types to be used.

    private bool bAllMIMETypes;

    private bool AllMIMETypes

  • 8/6/2019 softcopyvasu

    47/85

    Website Fetcher

    47

    {

    get

    {

    return bAllMIMETypes;

    }

    set

    { bAllMIMETypes = value; }

    }

    // construct MIME types string from settings xml file

    static string GetMIMETypes()

    {

    string str = "";

    // check for settings xml file existence

    if (File.Exists(Application.StartupPath +

    "\\Settings.xml"))

    {

    XmlDocument doc = new XmlDocument();

    doc.Load(Application.StartupPath + "\\Settings.xml");

    XmlNode element =

    doc.DocumentElement.SelectSingleNode("SettingsForm-

    listViewFileMatches");

    if (element != null)

    {

    for (int n = 0; n < element.ChildNodes.Count; n++)

    {

    XmlNode xmlnode = element.ChildNodes[n];

    XmlAttribute attribute =xmlnode.Attributes["Checked"];

    if (attribute == null || attribute.Value.ToLower() !=

    "true")

    continue;

    string[] items = xmlnode.InnerText.Split(' \t');

  • 8/6/2019 softcopyvasu

    48/85

    Website Fetcher

    48

    if (items.Length > 1)

    {

    str += items[0];

    if (items.Length > 2)

    str += '[' + items[1] + ',' + items[2] + ']';

    str += ';';

    }

    }

    }

    }

    return str;

    }

    private void button7_Click(object sender, EventArgs e)

    {

    FileTypeForm form = new FileTypeForm();

    if (form.ShowDialog() == DialogResult.OK)

    {

    ListViewItem item =

    this.listViewFileMatches.Items.Add(form.textBoxTypeDescription.Text);

    item.SubItems.Add(form.numericUpDownMinSize.Value.ToString());

    item.SubItems.Add(form.numericUpDownMaxSize.Value.ToString());

    }

    }

    private void button8_Click(object sender, EventArgs e)

    {

    if (this.listViewFileMatches.SelectedItems.Count == 0)

    return;

    ListViewItem item =

    this.listViewFileMatches.SelectedItems[0];

    FileTypeForm form = new FileTypeForm();

    form.textBoxTypeDescription.Text = item.Text;

    if (item.SubItems.Count

  • 8/6/2019 softcopyvasu

    49/85

    Website Fetcher

    49

    form.numericUpDownMinSize.Value =

    int.Parse(item.SubItems[1].Text);

    if (item.SubItems.Count

  • 8/6/2019 softcopyvasu

    50/85

    Website Fetcher

    50

    Classes: These are the pre defined class used in our project

    Network Class

    Provides a property, event, and methods for interacting with the network to which thecomputer is connected.

    Syntax:

    [HostProtectionAttribute(SecurityAction.LinkDemand, Resources =HostProtectionResource.ExternalProcessMgmt)]

    publicclass Network

    XmlReader Class

    Represents a reader that provides fast, non-cached, forward-only access to XML data.

    Namespace: System.Xml

    Assembly: System.Xml (in System.Xml.dll)

    Example:public abstract class XmlReader : IDisposable

    XmlReader provides forward-only, read-only access to a stream of XML data. The

    XmlReader class conforms to the W3C Extensible Markup Language (XML) 1.0 and the

    Namespaces in XML recommendations.

    XmlWriter Class

    Represents a writer that provides a fast, non-cached, forward-only means of generating

    streams or files containing XML data.

    Namespace: System.XmlAssembly: System.Xml (in System.Xml.dll)

    Example:

    public abstract class XmlWriter : IDisposable

  • 8/6/2019 softcopyvasu

    51/85

    Website Fetcher

    51

    SCREENS

  • 8/6/2019 softcopyvasu

    52/85

    Website Fetcher

    52

    SCREEN DESIGNING:

    THREADSVIEW:

  • 8/6/2019 softcopyvasu

    53/85

    Website Fetcher

    53

    Description:

    First our system establish connection with the system after that user gives

    one URL (Uniform resource Locator) give one URL as input .It start searching or

    fetching the information of that URL by starting threads process. In This process 10

    threads will be running continuously to get all the URIS information and stores them in a

    queue.

    REQUEST VIEW:

  • 8/6/2019 softcopyvasu

    54/85

    Website Fetcher

    54

    Description:

    At the time of down loading each URI it puts in threads view after completion of

    down load process it jus transfers the completed URI into the request phase

    ERRORS VIEW:

  • 8/6/2019 softcopyvasu

    55/85

    Website Fetcher

    55

    Description:

    so while fetching any URI corresponding to URL, any difficulties or any errors occurs it

    just listed in error view phase. The information like internet connection status, how many

    URIs downloaded how many errors occurred, how much memory available, whats the

    CPU usage these things are displayed in the status bar.

  • 8/6/2019 softcopyvasu

    56/85

    Website Fetcher

    56

    Description:

    At the time of down loading each URI it puts in threads view after completion of

    down load process it jus transfers the completed URI into the request phase

  • 8/6/2019 softcopyvasu

    57/85

    Website Fetcher

    57

    Description:

    MIME Types:

  • 8/6/2019 softcopyvasu

    58/85

    Website Fetcher

    58

    Description:

    In this will set all kinds of data we need to extract from the particular URI like weather

    we need storing data, Boolean data and images information or not

    Edit File Type:

  • 8/6/2019 softcopyvasu

    59/85

    Website Fetcher

    59

    Description:

    This screen shot displaying the mime type and image size can

    Connections Settings:

  • 8/6/2019 softcopyvasu

    60/85

    Website Fetcher

    60

    Description:

    This screen shot displaying the how many threads we are passing and thread waiting

    2seconds and connection time out 20 seconds.

    Output settings:

  • 8/6/2019 softcopyvasu

    61/85

    Website Fetcher

    61

    Description:

    In this will set all kinds of data we need to extract from the particular URI like weather

    we need storing data, Boolean data and images information or not

    Advanced Settings:

  • 8/6/2019 softcopyvasu

    62/85

    Website Fetcher

    62

    Description:

    These are the settings made by the user in order to restrict some kind of website like

    with domain name as .NET,.AC.IN like this.

  • 8/6/2019 softcopyvasu

    63/85

    Website Fetcher

    63

    Description:

  • 8/6/2019 softcopyvasu

    64/85

    Website Fetcher

    64

    Description:

  • 8/6/2019 softcopyvasu

    65/85

    Website Fetcher

    65

    Description:

  • 8/6/2019 softcopyvasu

    66/85

    Website Fetcher

    66

    Description:

    This screen shot displaying the urls,in this urls displaying some specified

    folders.

  • 8/6/2019 softcopyvasu

    67/85

    Website Fetcher

    67

    Description:

    This screen shot displaying the folders and pages.

  • 8/6/2019 softcopyvasu

    68/85

    Website Fetcher

    68

    REPORTS

  • 8/6/2019 softcopyvasu

    69/85

    Website Fetcher

    69

    Description:

    The above report displays the uri pages can saving specified folder,the entire

    information will store on this folder.

  • 8/6/2019 softcopyvasu

    70/85

    Website Fetcher

    70

    Description:This report will displaying the url link,in this link displaying sublinks .

  • 8/6/2019 softcopyvasu

    71/85

    Website Fetcher

    71

    TECHNOLOGY

    SPECIFICATION

  • 8/6/2019 softcopyvasu

    72/85

    Website Fetcher

    72

    NET Framework

    The .NET Framework is an integral Windows component that supports building

    and running the next generation of applications and XML Web services. The .NET

    Framework is designed to fulfill the following objectives:

    To provide a consistent object-oriented programming environment whether object

    code is stored and executed locally, executed locally but Internet-distributed, or

    executed remotely.

    To provide a code-execution environment that minimizes software deployment

    and versioning conflicts.

    To provide a code-execution environment that promotes safe execution of code,

    including code created by an unknown or semi-trusted third party.

    To provide a code-execution environment that eliminates the performance

    problems of scripted or interpreted environments.

    To make the developer experience consistent across widely varying types of

    applications, such as Windows-based applications and Web-based applications.

    To build all communication on industry standards to ensure that code based on the

    .NET Framework can integrate with any other code.

    The .NET Framework has two main components: the common language runtime

    and the .NET Framework class library. The common language runtime is the foundationof the .NET Framework. You can think of the runtime as an agent that manages code at

    execution time, providing core services such as memory management, thread

    management, and remoting, while also enforcing strict type safety and other forms of

    code accuracy that promote security and robustness. In fact, the concept of code

    management is a fundamental principle of the runtime. Code that targets the runtime is

    known as managed code, while code that does not target the runtime is known as

    unmanaged code. The class library, the other main component of the .NET Framework, is

    a comprehensive, object-oriented collection of reusable types that you can use to develop

    applications ranging from traditional command-line or graphical user interface (GUI)

    applications to applications based on the latest innovations provided by ASP.NET, such

    as Web Forms and XML Web services.

    The .NET Framework can be hosted by unmanaged components that load the

    common language runtime into their processes and initiate the execution of managed

  • 8/6/2019 softcopyvasu

    73/85

    Website Fetcher

    73

    code, thereby creating a software environment that can exploit both managed and

    unmanaged features. The .NET Framework not only provides several runtime hosts, but

    also supports the development of third-party runtime hosts.

    For example, ASP.NET hosts the runtime to provide a scalable, server-side

    environment for managed code. ASP.NET works directly with the runtime to enable

    ASP.NET applications and XML Web services, both of which are discussed later in this

    topic.

    Internet Explorer is an example of an unmanaged application that hosts the

    runtime (in the form of a MIME type extension). Using Internet Explorer to host the

    runtime enables you to embed managed components or Windows Forms controls in

    HTML documents. Hosting the runtime in this way makes managed mobile code (similar

    to Microsoft ActiveX controls) possible, but with significant improvements that only

    managed code can offer, such as semi-trusted execution and isolated file storage.

    The following illustration shows the relationship of the common language

    runtime and the class library to your applications and to the overall system. The

    illustration also shows how managed code operates within a larger architecture.

    NET Framework in context

  • 8/6/2019 softcopyvasu

    74/85

    Website Fetcher

    74

    The following sections describe the main components and features of the .NET

    Framework in greater detail.

    Features of the Common Language Runtime

    The common language runtime manages memory, thread execution, code

    execution, code safety verification, compilation, and other system services. These

    features are intrinsic to the managed code that runs on the common language runtime.

    With regards to security, managed components are awarded varying degrees of

    trust, depending on a number of factors that include their origin (such as the Internet,

    enterprise network, or local computer). This means that a managed component might or

    might not be able to perform file-access operations, registry-access operations, or other

    sensitive functions, even if it is being used in the same active application.

    The runtime enforces code access security. For example, users can trust that an

    executable embedded in a Web page can play an animation on screen or sing a song, but

    cannot access their personal data, file system, or network. The security features of the

  • 8/6/2019 softcopyvasu

    75/85

    Website Fetcher

    75

    runtime thus enable legitimate Internet-deployed software to be exceptionally feature

    rich.

    The runtime also enforces code robustness by implementing a strict type-and-

    code-verification infrastructure called the common type system (CTS). The CTS ensures

    that all managed code is self-describing. The various Microsoft and third-party language

    compilers generate managed code that conforms to the CTS. This means that managed

    code can consume other managed types and instances, while strictly enforcing type

    fidelity and type safety.

    In addition, the managed environment of the runtime eliminates many common

    software issues. For example, the runtime automatically handles object layout and

    manages references to objects, releasing them when they are no longer being used. This

    automatic memory management resolves the two most common application errors,

    memory leaks and invalid memory references.

    The runtime also accelerates developer productivity. For example, programmers

    can write applications in their development language of choice, yet take full advantage of

    the runtime, the class library, and components written in other languages by other

    developers. Any compiler vendor who chooses to target the runtime can do so. Language

    compilers that target the .NET Framework make the features of the .NET Framework

    available to existing code written in that language, greatly easing the migration process

    for existing applications.

    While the runtime is designed for the software of the future, it also supports

    software of today and yesterday. Interoperability between managed and unmanaged code

    enables developers to continue to use necessary COM components and DLLs.

    The runtime is designed to enhance performance. Although the common language

    runtime provides many standard runtime services, managed code is never interpreted. A

    feature called just-in-time (JIT) compiling enables all managed code to run in the native

    machine language of the system on which it is executing. Meanwhile, the memory

    manager removes the possibilities of fragmented memory and increases memory locality-

    of-reference to further increase performance.

    Finally, the runtime can be hosted by high-performance, server-side applications,

    such as Microsoft SQL Server and Internet Information Services (IIS). This

  • 8/6/2019 softcopyvasu

    76/85

    Website Fetcher

    76

    infrastructure enables you to use managed code to write your business logic, while still

    enjoying the superior performance of the industry's best enterprise servers that support

    runtime hosting.

    NET Framework Class Library

    The .NET Framework class library is a collection of reusable types that tightly

    integrate with the common language runtime. The class library is object oriented,

    providing types from which your own managed code can derive functionality. This not

    only makes the .NET Framework types easy to use, but also reduces the time associated

    with learning new features of the .NET Framework. In addition, third-party components

    can integrate seamlessly with classes in the .NET Framework.

    For example, the .NET Framework collection classes implement a set of

    interfaces that you can use to develop your own collection classes. Your collection

    classes will blend seamlessly with the classes in the .NET Framework.

    As you would expect from an object-oriented class library, the .NET Framework

    types enable you to accomplish a range of common programming tasks, including tasks

    such as string management, data collection, database connectivity, and file access. In

    addition to these common tasks, the class library includes types that support a variety of

    specialized development scenarios. For example, you can use the .NET Framework to

    develop the following types of applications and services:

    Console applications.

    Windows GUI applications (Windows Forms).

    ASP.NET applications.

    XML Web services.

    Windows services.

    For example, the Windows Forms classes are a comprehensive set of reusable

    types that vastly simplify Windows GUI development. If you write an ASP.NET Web

    Form application, you can use the Web Forms classes.

  • 8/6/2019 softcopyvasu

    77/85

    Website Fetcher

    77

    Client Application Development

    Client applications are the closest to a traditional style of application in Windows-

    based programming. These are the types of applications that display windows or forms

    on the desktop, enabling a user to perform a task. Client applications include applications

    such as word processors and spreadsheets, as well as custom business applications such

    as data-entry tools, reporting tools, and so on. Client applications usually employ

    windows, menus, buttons, and other GUI elements, and they likely access local resources

    such as the file system and peripherals such as printers.

    Another kind of client application is the traditional ActiveX control (now replaced

    by the managed Windows Forms control) deployed over the Internet as a Web page. This

    application is much like other client applications: it is executed natively, has access to

    local resources, and includes graphical elements.

    In the past, developers created such applications using C/C++ in conjunction with

    the Microsoft Foundation Classes (MFC) or with a rapid application development (RAD)

    environment such as Microsoft Visual Basic. The .NET Framework incorporates

    aspects of these existing products into a single, consistent development environment that

    drastically simplifies the development of client applications.

    The Windows Forms classes contained in the .NET Framework are designed

    to be used for GUI development. You can easily create command windows, buttons,

    menus, toolbars, and other screen elements with the flexibility necessary to

    accommodate shifting business needs.

    For example, the .NET Framework provides simple properties to adjust visual

    attributes associated with forms. In some cases the underlying operating system does not

    support changing these attributes directly, and in these cases the .NET Framework

    automatically recreates the forms. This is one of many ways in which the .NET

    Framework integrates the developer interface, making coding simpler and more

    consistent.

    Unlike ActiveX controls, Windows Forms controls have semi-trusted access to a

    user's computer. This means that binary or natively executing code can access some of

    the resources on the user's system (such as GUI elements and limited file access) without

    being able to access or compromise other resources. Because of code access security,

    many applications that once needed to be installed on a user's system can now be

  • 8/6/2019 softcopyvasu

    78/85

    Website Fetcher

    78

    deployed through the Web. Your applications can implement the features of a local

    application while being deployed like a Web page.

    Server Application Development

    Server-side applications in the managed world are implemented through runtimehosts. Unmanaged applications host the common language runtime, which allows your

    custom managed code to control the behavior of the server. This model provides you with

    all the features of the common language runtime and class library while gaining the

    performance and scalability of the host server.

    The following illustration shows a basic network schema with managed code

    running in different server environments. Servers such as IIS and SQL Server can

    perform standard operations while your application logic executes through the managed

    code.

    Server-side managed code

    ASP.NET is the hosting environment that enables developers to use the .NET

    Framework to target Web-based applications. However, ASP.NET is more than just a

    runtime host; it is a complete architecture for developing Web sites and Internet-

    distributed objects using managed code. Both Web Forms and XML Web services use

    IIS and ASP.NET as the publishing mechanism for applications, and both have a

    collection of supporting classes in the .NET Framework.

    XML Web services, an important evolution in Web-based technology, are

    distributed, server-side application components similar to common Web sites. However,

  • 8/6/2019 softcopyvasu

    79/85

    Website Fetcher

    79

    unlike Web-based applications, XML Web services components have no UI and are not

    targeted for browsers such as Internet Explorer and Netscape Navigator. Instead, XML

    Web services consist of reusable software components designed to be consumed by other

    applications, such as traditional client applications, Web-based applications, or even

    other XML Web services. As a result, XML Web services technology is rapidly moving

    application development and deployment into the highly distributed environment of the

    Internet.

    If you have used earlier versions of ASP technology, you will immediately notice

    the improvements that ASP.NET and Web Forms offer. For example, you can develop

    Web Forms pages in any language that supports the .NET Framework. In addition, your

    code no longer needs to share the same file with your HTTP text (although it can

    continue to do so if you prefer). Web Forms pages execute in native machine language

    because, like any other managed application, they take full advantage of the runtime. In

    contrast, unmanaged ASP pages are always scripted and interpreted. ASP.NET pages are

    faster, more functional, and easier to develop than unmanaged ASP pages because they

    interact with the runtime like any managed application.

    The .NET Framework also provides a collection of classes and tools to aid in

    development and consumption of XML Web services applications. XML Web services

    are built on standards such as SOAP (a remote procedure-call protocol), XML (an

    extensible data format), and WSDL ( the Web Services Description Language). The

    .NET Framework is built on these standards to promote interoperability with non-

    Microsoft solutions.

    For example, the Web Services Description Language tool included with the

    .NET Framework SDK can query an XML Web service published on the Web, parse its

    WSDL description, and produce C# or Visual Basic source code that your application can

    use to become a client of the XML Web service. The source code can create classes

    derived from classes in the class library that handle all the underlying communication

    using SOAP and XML parsing. Although you can use the class library to consume XML

    Web services directly, the Web Services Description Language tool and the other tools

    contained in the SDK facilitate your development efforts with the .NET Framework.

    If you develop and publish your own XML Web service, the .NET Framework

    provides a set of classes that conform to all the underlying communication standards,

  • 8/6/2019 softcopyvasu

    80/85

    Website Fetcher

    80

    such as SOAP, WSDL, and XML. Using those classes enables you to focus on the logic

    of your service, without concerning yourself with the communications infrastructure

    required by distributed software development.

    Finally, like Web Forms pages in the managed environment, your XML Web

    service will run with the speed of native machine language using the scalable

    communication of IIS

    C# Version 3.0 SpecificationOverview of C# 3.0

    C# 3.0 (C# Orcas) introduces several language extensions that build on C# 2.0

    to support the creation and use of higher order, functional style class libraries. The

    extensions enable construction of compositional APIs that have equal expressive power

    of query languages in domains such as relational databases and XML. The extensions

    include:

    Implicitly typed local variables, which permit the type of local variables to

    be inferred from the expressions used to initialize them.

    Extension methods, which make it possible to extend existing types and

    constructed types with additional methods.

    Lambda expressions, an evolution of anonymous methods that provides

    improved type inference and conversions to both delegate types and

    expression trees.

    Object initializes, which ease construction and initialization of objects.

    Anonymous types, which are tuple types automatically inferred and

    created from object initializes.

    Implicitly typed arrays, a form of array creation and initialization that

    infers the element type of the array from an array initializer.

    Query expressions, which provide a language integrated syntax for queries

    that is similar to relational and hierarchical query languages such as SQL

    and XQuery.

  • 8/6/2019 softcopyvasu

    81/85

    Website Fetcher

    81

    Expression trees, which permit lambda expressions to be represented as

    data (expression trees) instead of as code (delegates).

    C# Keywords

    Keywords are predefined reserved identifiers that have special meanings to the

    compiler. They cannot be used as identifiers in your program unless they include @ as a

    prefix. For example, @if is a legal identifier but if is not because it is a keyword.

    CONCLUSION

    The proposed project once implemented can connect to the websites and download data

    which once indexed can be as an input to a search engine.

    The software is designed in such a way that the user can easily interact with the screen

    because they are GUI. To search any website, software is designed in such a way that it

    can be extended to the real time environment.

    In order to fulfill this requirement we are going to build a multi threaded windows

    application that downloads and stores uniform resource identifiers of typical website.

    Roughly a crawler starts off by placing an URL, so in a queue, where all URIS are

    retrieved, kept and prioritized. This process is repeated until the Fetcher decides to stop.

  • 8/6/2019 softcopyvasu

    82/85

    Website Fetcher

    82

    BIBLIOGRAPHY &

  • 8/6/2019 softcopyvasu

    83/85

    Website Fetcher

    83

    REFERENCE

    BOOKS

    1. Web development with vb.net, by Steve harries and robs MacDonald, 1991,

    aprs

    2. Microsoft visual basic.net programmer cookbook, by Matthew

    MacDonald, 1998, Tata McGraw hill edition

    3. Beginning asp.net 1.0 with visual basic.net, by Chris Goode, john

    Kauffman, 1999, worn programmer to programmer

  • 8/6/2019 softcopyvasu

    84/85

    Website Fetcher

    84

    Web Site

    FOR .NET INSTALLATION

    http://www.microsoft.com/downloads/en/details.aspx?FamilyID=9cfb2d51

    -5ff4-4491-b0e5-b386f32c0992&displaylang=en

    FOR DEPLOYMENT AND PACKING ON SERVER

    http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx

    http://windows.microsoft.com/en-US/windows7/products/home

    http://msdn.microsoft.com/en-us/library/f7ykdhsy(v=vs.71).aspx

    For Validations:

    http://www.codeproject.com/KB/validation/aspnetvalidation.aspx

    FOR C#.NET.Net and ASP.Net

    http://www.codeproject.com/KB/dotnet/Reflection.aspx

  • 8/6/2019 softcopyvasu

    85/85

    Website Fetcher

    http://en.wikipedia.org/wiki/ASP.NET