Date post: | 08-Apr-2018 |
Category: |
Documents |
Upload: | manju-yanamala |
View: | 219 times |
Download: | 0 times |
of 82
8/6/2019 softcopyvasu
1/85
Website Fetcher
1
Abstract
The On demand web speeding and fetching is a multithreaded windows application that
downloads and stores Web pages Uniform Resource Identifier (URIs), for a Web search
engine. Roughly, a crawler starts off by placing an initial set of URLs, so, in a queue,
where all URLs to be retrieved are kept and prioritized. From this queue, the crawler gets
a URL (in some order), downloads the page, extracts any URLs in the downloaded page,
and puts the new URLs in the queue. This process is repeated until the crawler decides to
stop. Collected pages are later used for other applications, such as a Web search engine or
a Web cache.
On demand Webspidering and Fetching aims to develop a user interface which brings
the information about a particular given website. This is a multithreaded windows
application that downloads and stores Uniform Resource Identifiers of typical website.
This application has got its use as a backend processing component for a search engine.
The results gathered by the Website Fetcher will be given to the indexer which indexes
page data so that the search query gives the results faster. The proposed project once
implemented can connect to the websites and download data which once indexed can be
given to the search engine
A crawler is a program that visits Web sites and reads their pages and other information
in order to create entries for a search engine index. The major search engines on the Web
all have such a program, which is also known as a "spider" or a "bot."
8/6/2019 softcopyvasu
2/85
Website Fetcher
2
ACKNOWLEDGMENT
Our express thanks and gratitude to Almighty God, our parents and friends without
whose unsustained support, we could not have made this project in 2008-2011. We wish
to place on our record and deep sense of gratitude to my principal Mr G. Sudhakar,
H.O.D Mr. M.Sreenivas, project guide Mr. Brahma for his constant motivation and
valuable help through the project work. Express our gratitude I Matrix technologies for
his valuable suggestions and advices throughout the project. We also extend our thanks to
other faculties for their cooperation during our course. Finally we would like to thanks.
INDEX
8/6/2019 softcopyvasu
3/85
Website Fetcher
3
S.NO TITLE PAGE NO
1. Organization Profile 2
2. Project Overview 6
3. Aim & Scope of the Project 7
4. Existing System 10
5. Proposed System 11
6. Software Requirements 12
7. Hardware Requirements 12
8. Feasible Study 13
9. System Design Introduction 15
10. Database Design 16
11. Data Dictionary 18
12. UML Diagrams 20
13. Unit Testing 38
14. Integration Testing 39
15. System Testing 39
16 System Implementation 40
17. Screens 45
18. Reports 60
19. Technology Specification 62
20. Conclusion 84
21. Biography and References 86
8/6/2019 softcopyvasu
4/85
Website Fetcher
4
INTRODUCTION
ORGANIZATION PROFILEIMATRIX TECHNOLOGIES
8/6/2019 softcopyvasu
5/85
Website Fetcher
5
iMatrix is an innovative organization having
a unique blend of industry and academia. We would
like to introduce ourselves as a team of fully qualified
professionals in the field of IT education. We are
dedicated to the development of industry driven academic programs that bridge the gap
between students and employers, benefiting both.
We provide all-inclusive and cost-effective training for students looking to
take the first step towards new careers. Effective learning happens when the student is
stimulated and challenged to go beyond their current frame of thinking and create a new
reality about the subject. We at iMatrix always strive for the best. The curriculum and
course material has been prepared after an intensive research and consultation with the
industry and academia so that it can be easily grasped by the students of all levels.
We are in no doubt that all of the training we deliver is at a incredible standard
and are constantly striving to advance and become even better. All our training modules
are well designed, well equipped, fit for purpose and delivered by trainers who are
motivational and inspirational, trainers who can make learning interesting and fun. Every
student has the freedom to discuss and learn. We very much understand the Industry as
well as the students current requirement. A student can have the industry experience
while studying in the Institute only.
Although we have best resources to cater best results, we believe that it is not only money
or manpower but a team effort which can make any organization to reach new heights.
Our team provides a highly conducive and congenial environment to facilitate the process
of learning. We have only Experienced Working Professionals as our faculties so that
they can share their expertise with the students. The faculties keep on updating
8/6/2019 softcopyvasu
6/85
Website Fetcher
6
themselves. Because they have already solved the same problems the students are likely
to encounter, our faculties are equipped to share their real-world experience and practical
solutions with the students. Everything the students learn is practical, relevant, and up to
date.
Why Us?
We have certified instructor-led training.
We provide you unlimited access to our labs to practice what you've learned.
We provide you with test preparation tools to assist you in preparing for your exams.
We assist you in preparing your resume and entering the workforce.
We provide books, reference materials, and guide you through the learning process.
At iMatrix technologies, we are providing Live Project Training to B.E
Computers, B.E IT (Information Technology), B.Tech, MBA, MCA, M.Tech students.
Our project training is based Live Projects from customers across the globe. The projects
will be on various platforms such as VB.net/
ASP.net/ VC++.net/ C#.net/Java/ JSP/ Servlets/ J2EE / J2ME Technologies.
We provide the students with the adequate infrastructure so that they
can practice and further enhance their skills. Practicing on their own would provide them
with the necessary confidence.
That is required to meet the challenging demands of the job world. We prepare students
for Real World Software Industry. The students become industry ready professionals after
completion of our Live
Project Training program.
Our focus or intent is not only to teach a language but also to gain practical knowledge.
This training programming we will help the students gain some practical live experience
thereby strengthening knowledge they had gained during their graduation. This practical
8/6/2019 softcopyvasu
7/85
Website Fetcher
7
hands-on experience would help them fill the gap which prevents majority of fresher's to
be considered by their prospective employers due to lack of Our training program
basically insists upon developing the student's ability to think dynamically and logically.
Live Project Training program enables candidates to have some hands on experience with
the real world development process and through complete Software Development Life
Cycle Our live project training program consists of two phases. First being the technology
training and the second one is project execution.
Phase 1 - Technology Training
Our rigorous technology training enables the students to
undertake in depth training, both theoretical and practical
sessions. Training places a high value on sharing information
and experience.
Stress on the basics
Interactive Sessions
Comprehensive material
Practice Tests
Attention to every individual
Phase 2 - Project Execution
In this phase, the students implement the knowledge they have secured in phase one.
They design, code, test and deploy the application.
Challenging projects dealing with cutting edge technologies.
All students are evaluated and mentored by people from IT
Industry.
All the industry best practices are explained and are
implemented in the projects.
8/6/2019 softcopyvasu
8/85
Website Fetcher
8
Latest tools are used to execute the projects.
Weekly project review, attendance, workshops, final test and interview are used to assess
the students.
Certificates are awarded on successful completion of the project.
Other Features
Friendly faculty and a lively work atmosphere.
Seminars from Industry professionals.
Complete module is designed and executed under the guidance of IT professionals.
Attendance is shared with the college on a fortnightly basis.
Constant inputs will be taken from the respective HODs of the students.
Project Overview
The Website Fetcher is a multithreaded windows application that downloads and stores
Web pages Uniform Resource Identifier (URIs), for a Web search engine. Roughly, a
crawler starts off by placing an initial set of URLs, so, in a queue, where all URLs to be
retrieved are kept and prioritized. From this queue, the crawler gets a URL (in some
order), downloads the page, extracts any URLs in the downloaded page, and puts the new
URLs in the queue. This process is repeated until the crawler decides to stop. Collected
pages are later used for other applications, such as a Web search engine or a Web cache.
8/6/2019 softcopyvasu
9/85
Website Fetcher
9
As the size of the Web grows, it becomes more difficult to retrieve the whole or a
significant portion of the Web using a single process. Therefore, many search engines
often run multiple processes in parallel to perform the above task, so that download rate is
maximized. We refer to this type of fetcher as a parallel crawler. This type of applications
are often used in search engines where there is a need of collecting all the URls based on
a query and indexing them on priority.
This application is a .Net based fetcher very similar to Googlebot, Googles crawler. This
application has got its use as a backend processing component for a search engine. The
results (URI data) gathered by the website fetcher will be given to an indexer which
indexes page data so that the search query gives the results faster.
Configurator module:
Mime types:
In this will set all kinds of data we need to extract from the particular URI like weather
we need storing data, Boolean data and images information or not
Output settings:
In this will set all kinds of data we need to extract from the particular URI like weather
we need storing data, Boolean data and images information or not
Advanced settings:
These are the settings made by the user in order to restrict some kind of website
like with domain name as .NET,.AC.IN like this.
8/6/2019 softcopyvasu
10/85
Website Fetcher
10
Multithreaded downloader:
Here the multi threaded downloader is responsible for starting threads and
obtaining the information about the website being fetched. So the multi threaded
downloader starts threads and it pushes all URIs one queue. Each and every thread is
starts with one Uri in the queue. After completion it just jumps to the next URIs in the
queue. In this module one folder creates in the user desired path and the files created with
the URI names having the static information
Aim and Scope of the project:
On demand Webspidering and Fetching aims to develop a user interface which
brings the information about a particular given website. This is a multithreaded
windows application that downloads and stores Uniform Resource Identifiers of
typical website. This application has got its use as a backend processing component
for a search engine. The results gathered by the Website Fetcher will be given to the
indexer which indexes page data so that the search query gives the results faster. The
proposed project once implemented can connect to the websites and download data
which once indexed can be given to the search engine.
The aim of the Action Plan-BS Project has been the preparation of an Action Plan inScience and Technology for the Black Sea countries and its adoption at the level of
relevant Ministers. To achieve its goal the project included preparatory work on a
draft Action Plan and the organization of two events: a High Level Officials meeting
and a Ministerial Meeting
8/6/2019 softcopyvasu
11/85
Website Fetcher
11
Project scope
When we observe many search engines the first question arises in our mind is how
they will be able to display the information in the form of many links. The typical answer
for this question is the search engine obviously has the information about the differentwebsites and corresponding links called URIs in their database. The immediate and
interesting question again is how the search is getting the information about all the sites.
So taking this point into consideration we are going to develop a user interface which
brings the information about a particular website given. If we know how we are getting
information about a particular website then it is very simple to implement this for all
websites.
SYSTEM ANALYSIS
8/6/2019 softcopyvasu
12/85
Website Fetcher
12
SOFTWARE REQUIREMENTS SPECIFICATION
Problem Definition:
The On demand web spidering and Fetching is a multithreaded
windows application that downloads and stores Web pages Uniform Resource
Identifier (URIs), for a Web search engine. Roughly, a crawler starts off by placing
an initial set of URLs, so, in a queue, where all URLs to be retrieved are kept and
prioritized. From this queue, the crawler gets a URL (in some order), downloads the
page, extracts any URLs in the downloaded page, and puts the new URLs in the
queue. This process is repeated until the crawler decides to stop. Collected pages are
later used for other applications, such as a Web search engine or a Web cache.
Aim and Scope of the project:
On demand Webspidering and Fetching aims to develop a user interface which
brings the information about a particular given website. This is a multithreaded
8/6/2019 softcopyvasu
13/85
Website Fetcher
13
windows application that downloads and stores Uniform Resource Identifiers of
typical website. This application has got its use as a backend processing component
for a search engine. The results gathered by the Website Fetcher will be given to the
indexer which indexes page data so that the search query gives the results faster. The
proposed project once implemented can connect to the websites and download data
which once indexed can be given to the search engine.
Why the particular project is chosen:
With the increasing growth in the number of industries and the competitive run for
the trophy of perfection and profit, these industries round the globe are in the lookout for
a roof to exhibit their products in the most convenient ,profitable and successful means.
It is a web based application. Here we are using Asp.Net with C# language. The
.NET Framework is a new computing platform that simplifies application development in
the highly distributed environment of the Internet. The .NET Framework is designed to
fulfill the objectives.
This is a web-based application. It avoids all the management service issues which
we are faced in the existing system. The Present System is having the advantages like fast
accessing of the data, Best Management Service and good Security Mechanism. In this
System the limitations are applicable.
Reasons for choosing Front-End and Back-End:
It is a web based application. Here we are using Asp.Net with C# language. The
.NET Framework is a new computing platform that simplifies application development in
the highly distributed environment of the Internet. The .NET Framework is designed to
fulfill the following objectives:
To provide a consistent object-oriented programming environment whether
object code is stored and executed locally, executed locally but Internet-
distributed, or executed remotely.
To provide a code-execution environment that minimizes software deployment
and versioning conflicts.
8/6/2019 softcopyvasu
14/85
Website Fetcher
14
To provide a code-execution environment that guarantees safe execution of
code, including code created by an unknown or semi-trusted third party.
To provide a code-execution environment that eliminates the performance
problems of scripted or interpreted environments.
To make the developer experience consistent across widely varying types of
applications, such as Windows-based applications and Web-based
applications.
To build all communication on industry standards to ensure that code based on
the .NET Framework can integrate with any other code.
The .NET Framework can be hosted by unmanaged components that load the
common language runtime into their processes and initiate the execution of
managed code, thereby creating a software environment that can exploit both
managed and unmanaged features.
Existing System:
As the size of the web grows, it becomes more difficult to retrieve the whole or
significant portion of the web information using a single process.
Downloading rate is minimized and downloading time is maximized due theabove reason.
Storage of static pages is not usually seen in any of the search engines.
Error pages when encountered are not stored separately.
8/6/2019 softcopyvasu
15/85
Website Fetcher
15
Proposed System:
When we observe many search engines the first question that arises in our mind is
how the information is displayed in the form of many links.
The typical answer is, the search engine obviously has the information about
different websites and corresponding links called URIs in its database.
The immediate and interesting question is how the search is getting the
information about all the sites.
Taking this point into consideration we are going to develop a user interface
which brings the information about a particular given website.
If we know how we are getting information about a particular website then it is
very simple to implement this for all websites.
Multiple processes are run in parallel to perform the above task, so that download
rate is maximized and downloading time is minimized.
8/6/2019 softcopyvasu
16/85
Website Fetcher
16
We refer to this type of fetcher as a parallel crawler
Static pages are stored in user desired folder.
Any difficulties encountered can be viewed separately in the Errors view.
SOFTWARE & HARDWARE SPECIFICATIONS
SOFTWARE SPECIFICATIONS:
Microsoft .net framework
Microsoft C# .net language
Microsoft Windows 2000
Microsoft Visual Studio 2005
Sql Server 2005
HARDWARE SPECIFICATIONS:
PROCESSOR: P4
RAM: 2 MB
HARD DISK : 60 GB
OPERATING SYSTEM:Windows 2000 or higher
8/6/2019 softcopyvasu
17/85
Website Fetcher
17
Feasibility Study
Feasibility Studyis a high level capsule version of the entire process intended to answer a
number of questions like: What is the problem? Is there any feasible solution to the given
problem? Is the problem even worth solving? Feasibility study is conducted once the
problem clearly understood. Feasibility study is necessary to determine that the proposed
system is Feasible by considering the technical, Operational, and Economical factors. By
having a detailed feasibility study the management will have a clear-cut view of the
proposed system.
The following feasibilities are considered for the project in order to ensure that the
project is variable and it does not have any major obstructions. Feasibility study
encompasses the following things:
Technical Feasibility
Economical Feasibility
Operational Feasibility
8/6/2019 softcopyvasu
18/85
Website Fetcher
18
Technical Feasibility
In this step, we verify whether the proposed systems are technically feasible or not. i.e.,
all the technologies required to develop the system are available readily or not.
Technical Feasibility determines whether the organization has the technology and skills
necessary to carry out the project and how this should be obtained. The system can be
feasible because of the following grounds.
All necessary technology exists to develop the system.
This system is too flexible and it can be expanded further.
Economical Feasibility
In this step, we verify which proposal is more economical. We compare the financial
benefits of the new system with the investment. The new system is economically feasible
only when the financial benefits are more than the investments and expenditure.
Economical Feasibility determines whether the project goal can be within the resource
limits allocated to it or not. It must determine whether it is worthwhile to process with the
entire project or whether the benefits obtained from the new system are not worth the
costs. Financial benefits must be equal or exceed the costs. In this issue, we should
consider:
The cost to conduct a full system investigation.
The cost of h/w and s/w for the class of application being considered.
The development tool.
The cost of maintenance etc.,
8/6/2019 softcopyvasu
19/85
Website Fetcher
19
OPERATIONAL FEASIBILITY
Proposed projects are beneficial only if they can be turned into information systems that
will meet the organizations operating requirements. Simply stated, this test of feasibility
asks if the system will work when it is developed and installed. Are there major barriers
to Implementation? Here are questions that will help test the operational feasibility of a
project:
Is there sufficient support for the project from management from users? If the current
system is well liked and used to the extent that persons will not be able to see reasons for
change, there may be resistance.
Are the current business methods acceptable to the user? If they are not, Users may
welcome a change that will bring about a more operational and useful systems.Have the
user been involved in the planning and development of the project? Early involvement
reduces the chances of resistance to the system and in General and increases the
likelihood of successful project.Since the proposed system was to help reduce the
hardships encountered In the existing manual system, the new system was considered to
be operational feasible.
8/6/2019 softcopyvasu
20/85
Website Fetcher
20
SYSTEM DESIGN
8/6/2019 softcopyvasu
21/85
Website Fetcher
21
DATABASE DESIGN
The data pertaining to proposed system is voluminous that a careful design of the
database must proceed before storing the data in the database.
A database management system provides flexibility in the storage and retrieval of data
bad production of information. The DBMS is a bridge between the application
program, which determines what data are needed and how they are processed, and the
operating system of the computer, which is responsible for placing data on the
magnetic storage devices. A schema defines the database and a subschema defines
the portion of the database that a specific program will use.
TYPES OF DATABASE DESIGN
CONCEPTUAL SCHEMA
Once a database designer is aware of the data which is to be stored within the database,
they must then determine where dependency is within the data. Sometimes when data.
8/6/2019 softcopyvasu
22/85
Website Fetcher
22
is changed you can be changing other data that is not visible. For example, in a list of
names and addresses, assuming a situation where multiple people can have the same
address, but one person cannot have more than one addresses, the name is dependent
upon the address, because if the address is different than the associated name is different
too. However, the other way around is different. One attribute can change and not
another.
LOGICALLY STRUCTURING DATA
Once the relationships and dependencies amongst the various pieces of information have
been determined, it is possible to arrange the data into a logical structure which can then
be mapped into the storage objects supported by the database management system. In the
case of relational databases the storage objects are tables which store data in rows and
columns. Each table may represent an implementation of either a logical object or a
relationship joining one or more instances of one or more logical objects. Relationships
between tables may then be stored as links connecting child tables with parents. Since
complex logical relationships are themselves tables they will probably have links to
more than one parent. In an Object database the storage objects correspond directly to
the objects used by the Object-oriented programming language used to write the
applications that will manage and access the data. The relationships may be defined as
attributes of the object classes involved or as methods that operate on the object classes.
PHYSICAL DATABASE DESIGN
The physical design of the database specifies the physical configuration of the database
on the storage media. This includes detailed specification of data elements, data types,
indexing options and other parameters residing in the DBMS data dictionary. It is the
8/6/2019 softcopyvasu
23/85
Website Fetcher
23
detailed design of a system that includes modules & the database's hardware & software
specifications of the system.
THE DESIGN PROCESS
The design process consists of the following steps:
Determine the purpose of your database - This helps prepare you for the
remaining steps.
Find and organize the information required - Gather all of the types of
information you might want to record in the database, such as product
name and order number.
Divide the information into tables - Divide your information itemsmajor
entities or subjects, such as Products or Orders. Each subject then
becomes a table.
Turn information items into columns - Decide what information you
want to store in each table. Each item becomes a field, and is
displayed as a column in the table. For example, an Employees table
might include fields such as Last Name and Hire Date.
Specify primary keys - Choose each tables primary key. The primary
key is a column that is used to uniquely identify each row. An
example might be Product ID or Order ID.
Set up the table relationships - Look at each table and decide how the
data in one table is related to the data in other tables. Add fields to
tables or create new tables to clarify the relationships, as necessary.
8/6/2019 softcopyvasu
24/85
Website Fetcher
24
Refine your design - Analyze your design for errors. Create the tables and
add a few records of sample data. See if you can get the results you
want from your tables. Make adjustments to the design, as needed.
Apply the normalization rules - Apply the data normalization rules to
see if your tables are structured correctly. Make adjustments to the
tables, as needed.
DETERMINING DATA TO BE STORED
In a majority of cases, a person who is doing the design of a database is a person with
expertise in the area of database design, rather than expertise in the domain from which
the data to be stored is drawn e.g. financial information, biological information etc.
Therefore the data to be stored in the database must be determined in cooperation with a
person who does have expertise in that domain, and who is aware of what data must
be stored within the system.This process is one which is generally considered part of
requirements analysis, and requires skill on the part of the database designer to elicit the
needed information from those with the domain knowledge. This is because those with
the necessary domain knowledge frequently cannot express clearly what their system
requirements for the database are as they are unaccustomed to thinking in terms of the
discrete data elements which must be stored. Data to be stored can be determined by
Requirement Specification.
Data Model
8/6/2019 softcopyvasu
25/85
Website Fetcher
25
The organization of the data is represented by a data model and identifies the logical
organization of data. In a model of real world similar things are usually grouped into
classes of object called object types.
A data model is a pattern according to which data are logically organized. It consists
of the named logical units of data and expresses the relationships among the data as
determined by the interpretation of the model of real world.
In the relational data model all attribute relationships and all associations are
represented as relations. There is no distinction even at the model level, between the
different kinds or relations. Syntactically all the relations are the same. The data
model does not produce the introduction of additional semantic information to
distinguish different relations according to their properties of models namely:
Network Model
Hierarchical data model
Relational Data model
Relational data Model
The relational data model is a formal model for representing relationships
among attributes of an entity set and the association between entity sets.
In the relational data model all attribute relationships and all associations
are represented as relations. There is no distinction even at the model level, between
the
Different kinds or relations. Syntactically all the relations are the same. The data
model does not produce the introduction of additional semantic information to
distinguish different relations according to their properties.
8/6/2019 softcopyvasu
26/85
Website Fetcher
26
NORMALIZATION
Normalization theory is built around the concept of normal forms. A
relation is said to be in particular normal form if it satisfies a certain specified set of
constraints.
FIRST NORMALFORM
A relation R is in first normal form if and only if all underlying domains
contained atomic values only.
SECOND NORMALFORM
A relation R is said to be in second normal form if and only if it is in first
normal form and every non-key attribute is fully dependent on the primary key.
THIRD NORMALFORM
A relation R is said to be in third normal form if and only if it is in second normal
form and every non key attribute is non transitively depend on the p
8/6/2019 softcopyvasu
27/85
Website Fetcher
27
8/6/2019 softcopyvasu
28/85
Website Fetcher
28
Data Dictionary
Thread count:
S.NO COLUMNNAME ALLOWNULLS
id int yes
Website log:
Coloumn name Data type Allow nulls
Websiteid _url_id int No
Website id Varchar(150) yes
url name Varchar(500) yes
Website url:
Coloumn name Data type Allow nulls
Websiteid_url_id Int no
Website id Varchar(150) Yes
url name Varchar(500) yes
8/6/2019 softcopyvasu
29/85
Website Fetcher
29
UML DIAGRAM
Use case Diagrams:
The main purpose of a use case diagram is to show what system functions are
performed for which actors. Roles of the actors in the system can be depicted.
It mainly specifies behavior of the System
Use case diagrams depict:
Use cases. A use case describes a sequence of actions that provide
something of measurable value to an actor and is drawn as a horizontal
ellipse.
Actors. An actor is a person, organization, or external system that plays a
role in one or more interactions with your system. Actors are drawn as
stick figures.
Associations. Associations between actors and use cases are indicated in
use case diagrams by solid lines. An association exists whenever an actor
is involved with an interaction described by a use case. Associations are
modeled as lines connecting use cases and actors to one another, with an
optional arrowhead on one end of the line. The arrowhead is often used to
indicating the direction of the initial invocation of the relationship or to
indicate the primary actor within the use case. The arrowheads are
typically confused with data flow and as a result I avoid their use.
Class Diagrams
a class diagram in the Unified Modeling Language (UML), is a type of static
structure diagram that describes the structure of a system by showing the system's
classes, their attributes, and the relationships between the classes.
Relationships:
A relationship is a general term covering the specific types of logical connections
found on class and objects diagrams. UML shows the following relationships:
8/6/2019 softcopyvasu
30/85
Website Fetcher
30
External links
A Link is the basic relationship among objects. It is represented as a line
connecting two or more object boxes. It can be shown on an object diagram or class
diagram. A link is an instance of an association.
Association
Class diagram example of association between two classes
An Association represents a family of links. Binary associations (with two ends)
are normally represented as a line, with each end connected to a class box. Higher order
associations can be drawn with more than two ends. In such cases, the ends are connected
to a central diamond.
An association can be named, and the ends of an association can be adorned with
role names, ownership indicators, multiplicity, visibility, and other properties. There are
five different types of association. Bi-directional and uni-directional associations are the
most common ones. For instance, a flight class is associated with a plane class bi-
directionally. Associations can only be shown on class diagrams.
Example: "department offers courses", is an association relationship.
Aggregation
Class diagram showing Aggregation between two classes
Aggregation is a variant of the "has a" or association relationship; aggregation is
more specific than association. It is an association that represents a part-whole
relationship. As a type of association, an aggregation can be named and have the same
adornments that an association can. However, an aggregation may not involve more than
two classes.
Composition
Class diagram showing Composition between two classes at top and Aggregation
between two classes at bottom
Composition is a stronger variant of the "has a" or association relationship;
composition is more specific than aggregation. It is represented with a solid diamond
shape.
8/6/2019 softcopyvasu
31/85
Website Fetcher
31
The UML graphical representation of a composition relationship is a filled
diamond shape on the containing class end of the tree of lines that connect contained
class(es) to the containing class.
Description:
It is class diagram it contains names,propertiesand methods.website
fetcher contains get URLs(),set URLs(),connect(),crawl().The crawl
class contains methods are parseURIs(),runthread(),
Crawler
ParseURI's()
runthread()
Settings
MIMEtype()
outputpath()
WebsiteFetcher
get URL's()
set URL's()
connect()
crawl()
Queue
add()
save()
open()
8/6/2019 softcopyvasu
32/85
Website Fetcher
32
USECASE DIAGRAMS
8/6/2019 softcopyvasu
33/85
Website Fetcher
33
USE CASE DIAGRAM:
Description:
In above diagram is a use case diagram. In this diagram the actor has to be perform
certain actions those are connecting www and seond action is getting urls and next
action isclassifying threads,urls errors and next ation is storing static pages .
COFIGURATOR MODULE:
Connecting www
URL Reading getting URLs
Classifying Threads,URLs,Errors
Storing static pages
Admin
Settings
8/6/2019 softcopyvasu
34/85
Website Fet er
34
Descri i
In above diagram is a use case diagram. In t is diagram t e actor has to be perform
certain actions those are mime t pe and this consist select data andnext action is output
setting it will specify the path of output folder.next action is advancd setting it consists
restrict data.
CRAWLER VIEW:
8/6/2019 softcopyvasu
35/85
Website Fetcher
35
Description:
In above diagram is a use case diagram. In this diagram the actor has to be perform
certain actions those are first action is the thread view and next action is the request view
and last action it will perform error view.
MULTITHREADED DOWNLOADER:
8/6/2019 softcopyvasu
36/85
Website Fetcher
36
Description:
In above diagram is a use case diagram. In this diagram the actor has to be perform
certain actions those are first action is the start threads and next action is the queueand in
this download theinformation and next action is the store in folder.
SEQUENCE DIAGRAM
8/6/2019 softcopyvasu
37/85
Website Fetcher
37
A sequence diagram in Unified Modelling Language (UML) is a kind of
interaction diagram that shows how processes operate with one another and in what
order. It is a construct of Message a Sequence Chart.
Sequence diagrams are sometimes called Event-trace diagrams, event scenarios,
and timing diagrams.
A sequence diagram shows, as parallel vertical lines ("lifelines"), different
processes or objects that live simultaneously, and, as horizontal arrows, the messages
exchanged between them, in the order in which they occur. This allows the specification
of simple runtime scenarios in a graphical manner.
Collaboration Diagram:
Admin Website Threads Errors view URLs Output path Static pages Settings
connect
configure
Errors
start
get
store
specify
perform
8/6/2019 softcopyvasu
38/85
Website Fetcher
38
8/6/2019 softcopyvasu
39/85
Website Fetcher
39
SYSTEM TESTING
TESTING
8/6/2019 softcopyvasu
40/85
Website Fetcher
40
Software Testing and Quality Assurance:
Software Testing:Software testing is critical element of software quality assuranceand represented the ultimate of review of specifications, design and coding. If testing is
conducted successfully, it will uncover errors in the software. As a secondary benefit,
testing demonstrate that software function appear to be working according to the
specifications that performance requirement appear to have been meet.
Why should we do testing?
Error free superior product
Quality assurance to the client
Black testingtakes an external perspective of the test object to derive test cases. These
tests can be functional or non-functional, though usually functional. The test designer
selects valid and invalid inputs and determines the correct output. There is no knowledge
of the test object's internal structure.
This method of test design is applicable to all levels of software testing: unit,
integration, functional testing, system and acceptance. The higher the level, and hence the
bigger and more complex the box, the more one is forced to use black box testing to
simplify. While this method can uncover unimplemented parts of the specification, one
cannot be sure that all existent paths are tested.
System Testing:
As more critical function in business organizations activity are automated, more and more
trust is being is placed in automated systems. This realization puts an ever increasing
burden on system annalist to ensure that quality of systems. Depend on this design,
development, testing and implementation and weakness in any areas will seriously
jeopardize the quality.
Testing:
8/6/2019 softcopyvasu
41/85
Website Fetcher
41
No system is ever perfect, communication problems, programs negligible or more
constraints create more errors that must be eliminated before delivering system to the
users. A system is tested for on line response, value of transactions, stress recovery from
failure and durability.
Testing strategy:
Considering from procedure point of view, testing is actually a series of three steps that
are implemented sequentially initially, test of each module is conducted individually,
ensuring that is functions properly as a unit. Hence, it is called UNIT TESTING next;
module must be assemble or integrated as a set of higher order test is conducted.
Unit Testing:
Unit testing mainly focuses verification effort on the smallest unit of software design
module. Using the design document, important control part are testing to uncover errors
within the boundary of the module.
Basically unit testing is of two types names white box testing or black box testing. This
structure testing is also referred to as white Box or Glass Box testing.
Integrated Testing:
Integration testing is a systematic technique for constructing the program structure, while
as the same time conducting test to uncover the errors associated with the interfacing. The
objective is to take unit tested modules and build the program structure that has been
dictated by the design.
Top-Down Integration:
8/6/2019 softcopyvasu
42/85
Website Fetcher
42
Top-Down integration is an incremental; approach in the construction of program
structure. Modules are integrated by moving download through the control hierarchical
beginning with the main control module to the subordinate modules.
Bottom-Up Integration:
Bottom-Up integration testing, as it names implies, begins construction and testing with
atomic modules and moving towards the higher modules. The type of integration is
depending on the application under development.
Information systems follow stop down integration mechanism moving from one level to
another level in a hierarchical format.
6.4) Validation Testing:
This test mainly concentrates on the software requirements specifications, a document
that describe user-visible attribute of the software. According to I.S the validation testing
refers to the weather the end user entered is an authorized user or not, if not he is not
allowed for login on. While submitting the requirement it is taken care such that the user
will not leave requirement null. The echo, he enters should not be blank.
8/6/2019 softcopyvasu
43/85
Website Fetcher
43
Test Case:
Test case 1: Valid Input Priority (H, L): High
Test Objective: For Getting Accurate Output
Test Description: User need to enter a valid URL or URI in the address barprovided in our application.
Requirements Verified: Yes
Test Environment: .Net IDE must be in running state, Threads should be
invoked as soon we click on Go button.
Test Setup/Pre-Conditions: .Net IDE should be in running state. Address
must be entered.
Actions Expected Results
The user will select Go button to get allthe static content present in that URL/
URI
Service Found, Download filesand store in our local desktop or an
error message should be pop up
Pass: Conditions pass: No Fail: No
Problems / Issues:NIL
Notes: Successfully Executed
8/6/2019 softcopyvasu
44/85
Website Fetcher
44
SYSTEM
IMPLEMENTATION
Design pattern Implementation
Implementing the Singleton Pattern in C#
The singleton pattern is one of the best-known patterns in software engineering.
Essentially, a singleton is a class which only allows a single instance of itself to be
created, and usually gives simple access to that instance. Most commonly, singletons
don't allow any parameters to be specified when creating the instance - as otherwise a
second request for an instance but with a different parameter could be problematic! (If the
same instance should be accessed for all requests with the same parameter, the factory
pattern is more appropriate.) This article deals only with the situation where no
parameters are required. Typically a requirement of singletons is that they are created
lazily - i.e. that the instance isn't created until it is first needed.
There are various different ways of implementing the singleton pattern in C#. I shall
present them here in reverse order of elegance, starting with the most commonly seen,
which is not thread-safe, and working up to a fully lazily-loaded, thread-safe, simple and
8/6/2019 softcopyvasu
45/85
Website Fetcher
45
highly performant version. Note that in the code here, I omit the private modifier, as it is
the default for class members. In many other languages such as Java, there is a different
default, and private should be used.
All these implementations share four common characteristics, however:
A single constructor, which is private and parameterless. This prevents other
classes from instantiating it (which would be a violation of the pattern). Note that
it also prevents sub classing - if a singleton can be sub classed once, it can be sub
classed twice, and if each of those subclasses can create an instance, the pattern is
violated. The factory pattern can be used if you need a single instance of a base
type, but the exact type isn't known until runtime.
The class is sealed. This is unnecessary, strictly speaking, due to the above point,
but may help the JIT to optimize things more.
A static variable which holds a reference to the single created instance, if any.
A public static means of getting the reference to the single created instance,
creating one if necessary.
Note that all of these implementations also use a public static property Instance as the
means of accessing the instance. In all cases, the property could easily be converted to a
method, with no impact on thread-safety or performance.
This is sample
// Bad code! Do not use!
public sealed class Singleton
{
static Singleton instance=null;
Singleton()
{}
public static Singleton Instance
{
get{if (instance==null)
{
instance = new Singleton();
}
return instance;
}
}
8/6/2019 softcopyvasu
46/85
Website Fetcher
46
}
Pseudo code
PrivatevoidlistViewRequests_SelectedIndexChanged(objectsender,
System.EventArgs e)
{
if (this.listViewRequests.SelectedItems.Count == 0)
return;
ListViewItem item =
this. list View Requests. SelectedItems[0];
if (item.SubItems.Count > 2)
this.textBoxRequest.Text = item.SubItems[2].Text;
}
AllMIMETypes = Settings.GetValue("Allow all MIME types", true);
// MIME types are the types that are supported to be downloaded by the crawler
// and the crawler includes a default types to be used.
private bool bAllMIMETypes;
private bool AllMIMETypes
8/6/2019 softcopyvasu
47/85
Website Fetcher
47
{
get
{
return bAllMIMETypes;
}
set
{ bAllMIMETypes = value; }
}
// construct MIME types string from settings xml file
static string GetMIMETypes()
{
string str = "";
// check for settings xml file existence
if (File.Exists(Application.StartupPath +
"\\Settings.xml"))
{
XmlDocument doc = new XmlDocument();
doc.Load(Application.StartupPath + "\\Settings.xml");
XmlNode element =
doc.DocumentElement.SelectSingleNode("SettingsForm-
listViewFileMatches");
if (element != null)
{
for (int n = 0; n < element.ChildNodes.Count; n++)
{
XmlNode xmlnode = element.ChildNodes[n];
XmlAttribute attribute =xmlnode.Attributes["Checked"];
if (attribute == null || attribute.Value.ToLower() !=
"true")
continue;
string[] items = xmlnode.InnerText.Split(' \t');
8/6/2019 softcopyvasu
48/85
Website Fetcher
48
if (items.Length > 1)
{
str += items[0];
if (items.Length > 2)
str += '[' + items[1] + ',' + items[2] + ']';
str += ';';
}
}
}
}
return str;
}
private void button7_Click(object sender, EventArgs e)
{
FileTypeForm form = new FileTypeForm();
if (form.ShowDialog() == DialogResult.OK)
{
ListViewItem item =
this.listViewFileMatches.Items.Add(form.textBoxTypeDescription.Text);
item.SubItems.Add(form.numericUpDownMinSize.Value.ToString());
item.SubItems.Add(form.numericUpDownMaxSize.Value.ToString());
}
}
private void button8_Click(object sender, EventArgs e)
{
if (this.listViewFileMatches.SelectedItems.Count == 0)
return;
ListViewItem item =
this.listViewFileMatches.SelectedItems[0];
FileTypeForm form = new FileTypeForm();
form.textBoxTypeDescription.Text = item.Text;
if (item.SubItems.Count
8/6/2019 softcopyvasu
49/85
Website Fetcher
49
form.numericUpDownMinSize.Value =
int.Parse(item.SubItems[1].Text);
if (item.SubItems.Count
8/6/2019 softcopyvasu
50/85
Website Fetcher
50
Classes: These are the pre defined class used in our project
Network Class
Provides a property, event, and methods for interacting with the network to which thecomputer is connected.
Syntax:
[HostProtectionAttribute(SecurityAction.LinkDemand, Resources =HostProtectionResource.ExternalProcessMgmt)]
publicclass Network
XmlReader Class
Represents a reader that provides fast, non-cached, forward-only access to XML data.
Namespace: System.Xml
Assembly: System.Xml (in System.Xml.dll)
Example:public abstract class XmlReader : IDisposable
XmlReader provides forward-only, read-only access to a stream of XML data. The
XmlReader class conforms to the W3C Extensible Markup Language (XML) 1.0 and the
Namespaces in XML recommendations.
XmlWriter Class
Represents a writer that provides a fast, non-cached, forward-only means of generating
streams or files containing XML data.
Namespace: System.XmlAssembly: System.Xml (in System.Xml.dll)
Example:
public abstract class XmlWriter : IDisposable
8/6/2019 softcopyvasu
51/85
Website Fetcher
51
SCREENS
8/6/2019 softcopyvasu
52/85
Website Fetcher
52
SCREEN DESIGNING:
THREADSVIEW:
8/6/2019 softcopyvasu
53/85
Website Fetcher
53
Description:
First our system establish connection with the system after that user gives
one URL (Uniform resource Locator) give one URL as input .It start searching or
fetching the information of that URL by starting threads process. In This process 10
threads will be running continuously to get all the URIS information and stores them in a
queue.
REQUEST VIEW:
8/6/2019 softcopyvasu
54/85
Website Fetcher
54
Description:
At the time of down loading each URI it puts in threads view after completion of
down load process it jus transfers the completed URI into the request phase
ERRORS VIEW:
8/6/2019 softcopyvasu
55/85
Website Fetcher
55
Description:
so while fetching any URI corresponding to URL, any difficulties or any errors occurs it
just listed in error view phase. The information like internet connection status, how many
URIs downloaded how many errors occurred, how much memory available, whats the
CPU usage these things are displayed in the status bar.
8/6/2019 softcopyvasu
56/85
Website Fetcher
56
Description:
At the time of down loading each URI it puts in threads view after completion of
down load process it jus transfers the completed URI into the request phase
8/6/2019 softcopyvasu
57/85
Website Fetcher
57
Description:
MIME Types:
8/6/2019 softcopyvasu
58/85
Website Fetcher
58
Description:
In this will set all kinds of data we need to extract from the particular URI like weather
we need storing data, Boolean data and images information or not
Edit File Type:
8/6/2019 softcopyvasu
59/85
Website Fetcher
59
Description:
This screen shot displaying the mime type and image size can
Connections Settings:
8/6/2019 softcopyvasu
60/85
Website Fetcher
60
Description:
This screen shot displaying the how many threads we are passing and thread waiting
2seconds and connection time out 20 seconds.
Output settings:
8/6/2019 softcopyvasu
61/85
Website Fetcher
61
Description:
In this will set all kinds of data we need to extract from the particular URI like weather
we need storing data, Boolean data and images information or not
Advanced Settings:
8/6/2019 softcopyvasu
62/85
Website Fetcher
62
Description:
These are the settings made by the user in order to restrict some kind of website like
with domain name as .NET,.AC.IN like this.
8/6/2019 softcopyvasu
63/85
Website Fetcher
63
Description:
8/6/2019 softcopyvasu
64/85
Website Fetcher
64
Description:
8/6/2019 softcopyvasu
65/85
Website Fetcher
65
Description:
8/6/2019 softcopyvasu
66/85
Website Fetcher
66
Description:
This screen shot displaying the urls,in this urls displaying some specified
folders.
8/6/2019 softcopyvasu
67/85
Website Fetcher
67
Description:
This screen shot displaying the folders and pages.
8/6/2019 softcopyvasu
68/85
Website Fetcher
68
REPORTS
8/6/2019 softcopyvasu
69/85
Website Fetcher
69
Description:
The above report displays the uri pages can saving specified folder,the entire
information will store on this folder.
8/6/2019 softcopyvasu
70/85
Website Fetcher
70
Description:This report will displaying the url link,in this link displaying sublinks .
8/6/2019 softcopyvasu
71/85
Website Fetcher
71
TECHNOLOGY
SPECIFICATION
8/6/2019 softcopyvasu
72/85
Website Fetcher
72
NET Framework
The .NET Framework is an integral Windows component that supports building
and running the next generation of applications and XML Web services. The .NET
Framework is designed to fulfill the following objectives:
To provide a consistent object-oriented programming environment whether object
code is stored and executed locally, executed locally but Internet-distributed, or
executed remotely.
To provide a code-execution environment that minimizes software deployment
and versioning conflicts.
To provide a code-execution environment that promotes safe execution of code,
including code created by an unknown or semi-trusted third party.
To provide a code-execution environment that eliminates the performance
problems of scripted or interpreted environments.
To make the developer experience consistent across widely varying types of
applications, such as Windows-based applications and Web-based applications.
To build all communication on industry standards to ensure that code based on the
.NET Framework can integrate with any other code.
The .NET Framework has two main components: the common language runtime
and the .NET Framework class library. The common language runtime is the foundationof the .NET Framework. You can think of the runtime as an agent that manages code at
execution time, providing core services such as memory management, thread
management, and remoting, while also enforcing strict type safety and other forms of
code accuracy that promote security and robustness. In fact, the concept of code
management is a fundamental principle of the runtime. Code that targets the runtime is
known as managed code, while code that does not target the runtime is known as
unmanaged code. The class library, the other main component of the .NET Framework, is
a comprehensive, object-oriented collection of reusable types that you can use to develop
applications ranging from traditional command-line or graphical user interface (GUI)
applications to applications based on the latest innovations provided by ASP.NET, such
as Web Forms and XML Web services.
The .NET Framework can be hosted by unmanaged components that load the
common language runtime into their processes and initiate the execution of managed
8/6/2019 softcopyvasu
73/85
Website Fetcher
73
code, thereby creating a software environment that can exploit both managed and
unmanaged features. The .NET Framework not only provides several runtime hosts, but
also supports the development of third-party runtime hosts.
For example, ASP.NET hosts the runtime to provide a scalable, server-side
environment for managed code. ASP.NET works directly with the runtime to enable
ASP.NET applications and XML Web services, both of which are discussed later in this
topic.
Internet Explorer is an example of an unmanaged application that hosts the
runtime (in the form of a MIME type extension). Using Internet Explorer to host the
runtime enables you to embed managed components or Windows Forms controls in
HTML documents. Hosting the runtime in this way makes managed mobile code (similar
to Microsoft ActiveX controls) possible, but with significant improvements that only
managed code can offer, such as semi-trusted execution and isolated file storage.
The following illustration shows the relationship of the common language
runtime and the class library to your applications and to the overall system. The
illustration also shows how managed code operates within a larger architecture.
NET Framework in context
8/6/2019 softcopyvasu
74/85
Website Fetcher
74
The following sections describe the main components and features of the .NET
Framework in greater detail.
Features of the Common Language Runtime
The common language runtime manages memory, thread execution, code
execution, code safety verification, compilation, and other system services. These
features are intrinsic to the managed code that runs on the common language runtime.
With regards to security, managed components are awarded varying degrees of
trust, depending on a number of factors that include their origin (such as the Internet,
enterprise network, or local computer). This means that a managed component might or
might not be able to perform file-access operations, registry-access operations, or other
sensitive functions, even if it is being used in the same active application.
The runtime enforces code access security. For example, users can trust that an
executable embedded in a Web page can play an animation on screen or sing a song, but
cannot access their personal data, file system, or network. The security features of the
8/6/2019 softcopyvasu
75/85
Website Fetcher
75
runtime thus enable legitimate Internet-deployed software to be exceptionally feature
rich.
The runtime also enforces code robustness by implementing a strict type-and-
code-verification infrastructure called the common type system (CTS). The CTS ensures
that all managed code is self-describing. The various Microsoft and third-party language
compilers generate managed code that conforms to the CTS. This means that managed
code can consume other managed types and instances, while strictly enforcing type
fidelity and type safety.
In addition, the managed environment of the runtime eliminates many common
software issues. For example, the runtime automatically handles object layout and
manages references to objects, releasing them when they are no longer being used. This
automatic memory management resolves the two most common application errors,
memory leaks and invalid memory references.
The runtime also accelerates developer productivity. For example, programmers
can write applications in their development language of choice, yet take full advantage of
the runtime, the class library, and components written in other languages by other
developers. Any compiler vendor who chooses to target the runtime can do so. Language
compilers that target the .NET Framework make the features of the .NET Framework
available to existing code written in that language, greatly easing the migration process
for existing applications.
While the runtime is designed for the software of the future, it also supports
software of today and yesterday. Interoperability between managed and unmanaged code
enables developers to continue to use necessary COM components and DLLs.
The runtime is designed to enhance performance. Although the common language
runtime provides many standard runtime services, managed code is never interpreted. A
feature called just-in-time (JIT) compiling enables all managed code to run in the native
machine language of the system on which it is executing. Meanwhile, the memory
manager removes the possibilities of fragmented memory and increases memory locality-
of-reference to further increase performance.
Finally, the runtime can be hosted by high-performance, server-side applications,
such as Microsoft SQL Server and Internet Information Services (IIS). This
8/6/2019 softcopyvasu
76/85
Website Fetcher
76
infrastructure enables you to use managed code to write your business logic, while still
enjoying the superior performance of the industry's best enterprise servers that support
runtime hosting.
NET Framework Class Library
The .NET Framework class library is a collection of reusable types that tightly
integrate with the common language runtime. The class library is object oriented,
providing types from which your own managed code can derive functionality. This not
only makes the .NET Framework types easy to use, but also reduces the time associated
with learning new features of the .NET Framework. In addition, third-party components
can integrate seamlessly with classes in the .NET Framework.
For example, the .NET Framework collection classes implement a set of
interfaces that you can use to develop your own collection classes. Your collection
classes will blend seamlessly with the classes in the .NET Framework.
As you would expect from an object-oriented class library, the .NET Framework
types enable you to accomplish a range of common programming tasks, including tasks
such as string management, data collection, database connectivity, and file access. In
addition to these common tasks, the class library includes types that support a variety of
specialized development scenarios. For example, you can use the .NET Framework to
develop the following types of applications and services:
Console applications.
Windows GUI applications (Windows Forms).
ASP.NET applications.
XML Web services.
Windows services.
For example, the Windows Forms classes are a comprehensive set of reusable
types that vastly simplify Windows GUI development. If you write an ASP.NET Web
Form application, you can use the Web Forms classes.
8/6/2019 softcopyvasu
77/85
Website Fetcher
77
Client Application Development
Client applications are the closest to a traditional style of application in Windows-
based programming. These are the types of applications that display windows or forms
on the desktop, enabling a user to perform a task. Client applications include applications
such as word processors and spreadsheets, as well as custom business applications such
as data-entry tools, reporting tools, and so on. Client applications usually employ
windows, menus, buttons, and other GUI elements, and they likely access local resources
such as the file system and peripherals such as printers.
Another kind of client application is the traditional ActiveX control (now replaced
by the managed Windows Forms control) deployed over the Internet as a Web page. This
application is much like other client applications: it is executed natively, has access to
local resources, and includes graphical elements.
In the past, developers created such applications using C/C++ in conjunction with
the Microsoft Foundation Classes (MFC) or with a rapid application development (RAD)
environment such as Microsoft Visual Basic. The .NET Framework incorporates
aspects of these existing products into a single, consistent development environment that
drastically simplifies the development of client applications.
The Windows Forms classes contained in the .NET Framework are designed
to be used for GUI development. You can easily create command windows, buttons,
menus, toolbars, and other screen elements with the flexibility necessary to
accommodate shifting business needs.
For example, the .NET Framework provides simple properties to adjust visual
attributes associated with forms. In some cases the underlying operating system does not
support changing these attributes directly, and in these cases the .NET Framework
automatically recreates the forms. This is one of many ways in which the .NET
Framework integrates the developer interface, making coding simpler and more
consistent.
Unlike ActiveX controls, Windows Forms controls have semi-trusted access to a
user's computer. This means that binary or natively executing code can access some of
the resources on the user's system (such as GUI elements and limited file access) without
being able to access or compromise other resources. Because of code access security,
many applications that once needed to be installed on a user's system can now be
8/6/2019 softcopyvasu
78/85
Website Fetcher
78
deployed through the Web. Your applications can implement the features of a local
application while being deployed like a Web page.
Server Application Development
Server-side applications in the managed world are implemented through runtimehosts. Unmanaged applications host the common language runtime, which allows your
custom managed code to control the behavior of the server. This model provides you with
all the features of the common language runtime and class library while gaining the
performance and scalability of the host server.
The following illustration shows a basic network schema with managed code
running in different server environments. Servers such as IIS and SQL Server can
perform standard operations while your application logic executes through the managed
code.
Server-side managed code
ASP.NET is the hosting environment that enables developers to use the .NET
Framework to target Web-based applications. However, ASP.NET is more than just a
runtime host; it is a complete architecture for developing Web sites and Internet-
distributed objects using managed code. Both Web Forms and XML Web services use
IIS and ASP.NET as the publishing mechanism for applications, and both have a
collection of supporting classes in the .NET Framework.
XML Web services, an important evolution in Web-based technology, are
distributed, server-side application components similar to common Web sites. However,
8/6/2019 softcopyvasu
79/85
Website Fetcher
79
unlike Web-based applications, XML Web services components have no UI and are not
targeted for browsers such as Internet Explorer and Netscape Navigator. Instead, XML
Web services consist of reusable software components designed to be consumed by other
applications, such as traditional client applications, Web-based applications, or even
other XML Web services. As a result, XML Web services technology is rapidly moving
application development and deployment into the highly distributed environment of the
Internet.
If you have used earlier versions of ASP technology, you will immediately notice
the improvements that ASP.NET and Web Forms offer. For example, you can develop
Web Forms pages in any language that supports the .NET Framework. In addition, your
code no longer needs to share the same file with your HTTP text (although it can
continue to do so if you prefer). Web Forms pages execute in native machine language
because, like any other managed application, they take full advantage of the runtime. In
contrast, unmanaged ASP pages are always scripted and interpreted. ASP.NET pages are
faster, more functional, and easier to develop than unmanaged ASP pages because they
interact with the runtime like any managed application.
The .NET Framework also provides a collection of classes and tools to aid in
development and consumption of XML Web services applications. XML Web services
are built on standards such as SOAP (a remote procedure-call protocol), XML (an
extensible data format), and WSDL ( the Web Services Description Language). The
.NET Framework is built on these standards to promote interoperability with non-
Microsoft solutions.
For example, the Web Services Description Language tool included with the
.NET Framework SDK can query an XML Web service published on the Web, parse its
WSDL description, and produce C# or Visual Basic source code that your application can
use to become a client of the XML Web service. The source code can create classes
derived from classes in the class library that handle all the underlying communication
using SOAP and XML parsing. Although you can use the class library to consume XML
Web services directly, the Web Services Description Language tool and the other tools
contained in the SDK facilitate your development efforts with the .NET Framework.
If you develop and publish your own XML Web service, the .NET Framework
provides a set of classes that conform to all the underlying communication standards,
8/6/2019 softcopyvasu
80/85
Website Fetcher
80
such as SOAP, WSDL, and XML. Using those classes enables you to focus on the logic
of your service, without concerning yourself with the communications infrastructure
required by distributed software development.
Finally, like Web Forms pages in the managed environment, your XML Web
service will run with the speed of native machine language using the scalable
communication of IIS
C# Version 3.0 SpecificationOverview of C# 3.0
C# 3.0 (C# Orcas) introduces several language extensions that build on C# 2.0
to support the creation and use of higher order, functional style class libraries. The
extensions enable construction of compositional APIs that have equal expressive power
of query languages in domains such as relational databases and XML. The extensions
include:
Implicitly typed local variables, which permit the type of local variables to
be inferred from the expressions used to initialize them.
Extension methods, which make it possible to extend existing types and
constructed types with additional methods.
Lambda expressions, an evolution of anonymous methods that provides
improved type inference and conversions to both delegate types and
expression trees.
Object initializes, which ease construction and initialization of objects.
Anonymous types, which are tuple types automatically inferred and
created from object initializes.
Implicitly typed arrays, a form of array creation and initialization that
infers the element type of the array from an array initializer.
Query expressions, which provide a language integrated syntax for queries
that is similar to relational and hierarchical query languages such as SQL
and XQuery.
8/6/2019 softcopyvasu
81/85
Website Fetcher
81
Expression trees, which permit lambda expressions to be represented as
data (expression trees) instead of as code (delegates).
C# Keywords
Keywords are predefined reserved identifiers that have special meanings to the
compiler. They cannot be used as identifiers in your program unless they include @ as a
prefix. For example, @if is a legal identifier but if is not because it is a keyword.
CONCLUSION
The proposed project once implemented can connect to the websites and download data
which once indexed can be as an input to a search engine.
The software is designed in such a way that the user can easily interact with the screen
because they are GUI. To search any website, software is designed in such a way that it
can be extended to the real time environment.
In order to fulfill this requirement we are going to build a multi threaded windows
application that downloads and stores uniform resource identifiers of typical website.
Roughly a crawler starts off by placing an URL, so in a queue, where all URIS are
retrieved, kept and prioritized. This process is repeated until the Fetcher decides to stop.
8/6/2019 softcopyvasu
82/85
Website Fetcher
82
BIBLIOGRAPHY &
8/6/2019 softcopyvasu
83/85
Website Fetcher
83
REFERENCE
BOOKS
1. Web development with vb.net, by Steve harries and robs MacDonald, 1991,
aprs
2. Microsoft visual basic.net programmer cookbook, by Matthew
MacDonald, 1998, Tata McGraw hill edition
3. Beginning asp.net 1.0 with visual basic.net, by Chris Goode, john
Kauffman, 1999, worn programmer to programmer
8/6/2019 softcopyvasu
84/85
Website Fetcher
84
Web Site
FOR .NET INSTALLATION
http://www.microsoft.com/downloads/en/details.aspx?FamilyID=9cfb2d51
-5ff4-4491-b0e5-b386f32c0992&displaylang=en
FOR DEPLOYMENT AND PACKING ON SERVER
http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx
http://windows.microsoft.com/en-US/windows7/products/home
http://msdn.microsoft.com/en-us/library/f7ykdhsy(v=vs.71).aspx
For Validations:
http://www.codeproject.com/KB/validation/aspnetvalidation.aspx
FOR C#.NET.Net and ASP.Net
http://www.codeproject.com/KB/dotnet/Reflection.aspx
8/6/2019 softcopyvasu
85/85
Website Fetcher
http://en.wikipedia.org/wiki/ASP.NET