Post on 26-Oct-2014
transcript
INTRODUCTION
1
1. INTRODUCTION
1.1 OBJECTIVE :
• The main objective is to find out the approximate answers for the systems
facing Dynamic Failure, in less time.
• We are going to do query processing in peer to peer network.
1.2 EXISTING SYSTEM :
The existing system uses structured P2P network with Distributed
Hash Table. In the existing system exact query processing is possible, but
there are certain disadvantages.
Structured network is organized in such a way that data items are
located at specific nodes in the network, and those nodes maintain
some state information to enable efficient retrieval of data.
Structured nodes are not efficient and flexible enough for
applications where nodes join or leave the network frequently.
The Sequential Algorithm is used which increases the latency of
the project.
Since the nodes are selected sequentially, if any node gets
disconnected , the exact answer is not received. And identification
of the disconnected node becomes tedious and some times
impossible.
2
PROPOSED SYSTEM :
The proposed system uses unstructured P2P network.
No assumptions about the location of the data items in the node are
made in our project.
We can join the nodes at random times and depart without a prior
notification.
We use approximate query processing to reduce the latency, which is
the aim of our project.
It is possible to run the project by dynamically adding and removing
the nodes.
3
FEASIBILITY STUDY
4
2. FEASIBILITY STUDY
2.1 SYSTEM ANALYSIS :
As P2P systems mature beyond file sharing applications and start
getting deployed in increasingly sophisticated e-business and
scientific environments, the vast amount of data within P2P databases
poses a different challenge that has not been adequately researched.
Aggregation queries have the potential of finding applications in
decision support, data analysis, and data mining. For example,
millions of peers across the world may be cooperating on a grand
experiment in astronomy, and astronomers may be interested in asking
queries that require the aggregation of vast amounts of data covering
thousands of peers.
There is real-world value for aggregation queries in Intrusion
Detection Systems, and application signature analysis in P2P
networks.
2.2 SYSTEM REQUIREMENTS :
2.2.1 HARDWARE REQUIREMENTS :
Hard disk : 40 GB
RAM : 512 MB
Processor Speed : 3.00GHz
Processor : Pentium IV Processor
5
2.2.2 SOFTWARE REQUIREMENTS :
Front End : VS .NET 2005
Code Behind : C#.net
Back End : SQL SERVER 2000
2.3 FLOW CHART:
The data flow diagrams (fig. 2.1) are used to explain the process of
working of the project in detail. After the registration and login process,
the query is passed from query node. Internally the random selection of
node is done from which the data is retrieved and stored.
6
FIGURE 2.1 The Initial Process
7
Start
Login in Query Node
Peer Listers
Sql Server connected peers
Two Phase Sampling
Visited nodes (Active peers)
UnVisited nodes (Inactive Peers)
Random walk for the Active nodes
Passing Aggregate Rules
Select table (Product or order details)
Calculate Probability of Active node
Prove the result of InActive node
Generate the report
MODULE DESCRIPTION
3. MODULE DESCRIPTION :
In this project SIX modules are used as follows:
Sign in
Peerlister
Activepeers
Aggregation
Viewtable
Report
Sign In:
The registration of the users and their passwords are done here.
Next using the login form, the users can enter inside the Query
processing Unit.
Peerlister:
Peerlister lists the peers which are connected with the query node.
FIGURE 3.1
Activepeers:
Login
Login Error
No Yes
Peer Lister
Peer 1
Peer 2 Peer 3
Peer 4
Peer 5
8
This module is to get all the peers which are connected with the sql
server.
All sql server connected peers are in the group visited peers.
Remaining peers are maintained by unvisited peers.
After this the two phase sampling is carried out.
FIGURE 3.2 SQL connected peers
FIGURE 3.3 Two phase sampling
Peer Lister
SQL Server Connected PeersDisconnected
PeersConnected Peers
Connected peers
Random walk from nodes
Segment into Two phases
9
Aggregation (Process):
Pass the aggregate rules to the table selected in the northwind
database, for the peers in the visited nodes.
Viewtable:
This module enables us to view tables and its respective fields in
any database.
Report module:
We are going to produce the report from our two phases and
visited and unvisited peers in a chart representation and enter the
representation of time of each peers.
10
LITERATURE REVIEW
11
4. LITERATURE REVIEW
4.1 GENERAL
P2P systems are becoming very popular because they provide an
efficient mechanism for building large scalable systems.
Recent work has developed powerful techniques for employing
sampling in the database engine to approximate aggregation queries
and to estimate database statistics.
Recent techniques have focused on providing formal foundations and
algorithms for block-level sampling and are thus most relevant to our
work. The objective in block-level sampling is to derive a
representative sample by randomly selecting a set of disk blocks of a
relation.
4.2 GOAL OF THE PROJECT
Given an aggregation query at a query node, compute with “minimum
cost” an approximate answer to this query with least errors.
4.3 CHALLENGES FACED
Picking even a set of uniform random peers is a difficult problem, as
the query node does not have the Internet Protocol (IP) addresses of
all peers in the network. This is a well-known problem that other
researchers have tackled (in different contexts) by using random-walk
techniques on the P2P graph . That is, a Markovian random walk is
12
initiated from the query node that picks adjacent peers to visit, with
equal probability and under certain connectivity properties, the
random walk is expected to rapidly reach a stationary distribution. If
the graph is badly clustered with small cuts, then this affects the speed
at which the walk converges.
Even if we could select a peer (or a set of peers) uniformly at random,
it does not make the problem of selecting a uniform random set of
tuples much easier. This is because visiting a peer at random has an
associated overhead; thus, it makes sense to select multiple tuples at
random from this peer during the same visit. However, this may
compromise the quality of the final set of tuples retrieved, as the
tuples within the same peer are likely to be correlated
4.4 THE PEER TO PEER MODEL
Each peer p is identified by the processor’s IP address and a port
number (IPp and portp).
The peer p is also characterized by the capabilities of the processor on
which it is located, including its CPU speed (pcpu), memory
bandwidth (pmem), and disk space (pdisk).
The node also has a limited amount of bandwidth in the network, say
pband.
In unstructured P2P networks, a node becomes a member of the
network by establishing a connection with at least one peer currently
in the network. Each node maintains a small number of connections
with its peers.
13
4.5 QUERY COST MEASURE
The primary cost measure that we consider is latency, which is
the time that it takes to propagate the query across multiple peers and receive
replies at the query node.
14
SYSTEM ENVIRONMENT
15
5. SYSTEM ENVIRONMENT:
5.1 FRONT END USED:
Microsoft Visual Studio dot Net is used as front end tool. The
reason for selecting Visual Studio dot Net as front end tool is as follows.
FEATURES OF MICROSOFT VISUAL STUDIO DOT NET:
Visual Studio .Net has flexibility, allowing one or more language
to interoperate to provide the solution. This Cross Language
Compatibility allows us to do projects at faster rate.
Visual Studio. Net has Common Language Runtime, which
allows the entire component to converge into one intermediate
format and then interact.
Visual Studio. Net provides excellent security when an
application is executed in the system
Visual Studio.Net has flexibility, allowing us to configure the
working environment to best suit our individual style. We can
choose between a single and multiple document interfaces, and
we can adjust the size and positioning of the various IDE
elements.
Visual Studio. Net has intelligent feature that makes the coding
easy and also dynamic help provides very less coding time.
16
The working environment in Visual Studio.Net is often referred
to as Integrated Development Environment because it integrates
many different functions such as design, editing, compiling and
debugging within a common environment.
After creating a Visual Studio. Net application, if we want to
distribute it to others we can freely distribute any application to
anyone who uses Microsoft windows. We can distribute our
applications on disk, on CDs, across networks, or over an
intranet or the internet.
Toolbars provide quick access to commonly used commands in
the programming environment. We click a button on the toolbar
once to carry out the action represented by that button. By
default, the standard toolbar is displayed when we start Visual
Basic dot Net. Additional toolbars for editing, form design, and
debugging can be toggled on or off from the toolbars command
on the view menu.
Many parts of Visual Studio are context sensitive. Context
sensitive means we can get help on these parts directly without
having to go through the help menu. For example, to get help on
any keyword in the Visual Basic language, place the insertion
point on that keyword in the code window and press F1.
Visual Studio interprets our code as we enter it, catching and
highlighting most syntax or spelling errors on the fly. It’s almost
like having an expert watching over our shoulder as we enter our
code.
5.2 BACK END USED:
17
Microsoft SQL SERVER 2000 is used as back end tool. The reason
for selecting SQL SERVER 2000 as a back end tool is as follows:
FEATURES OF SQL SERVER 2000
The OLAP Services feature available in SQL Server version 7.0
is now called SQL Server 2000 Analysis Services. The term OLAP
Services has been replaced with the term Analysis Services. Analysis
Services also includes a new data mining component. The Repository
component available in SQL Server version 7.0 is now called Microsoft
SQL Server 2000 Meta Data Services. References to the component now
use the term Meta Data Services. The term repository is used only in
reference to the repository engine within Meta Data Services.
SQL-SERVER database consist of six type of objects,
They are,
1. TABLE
2. QUERY
3. FORM
4. REPORT
5. MACRO
1) TABLE:
A database is a collection of data about a specific topic.
We can View a table in two ways,
18
a) Design View
b) Datasheet View
a)Design View
To build or modify the structure of a table, we work in the table
design view. We can specify what kind of datas will be holded.
b)Datasheet View
To add, edit or analyses the data itself, we work in tables datasheet
view mode.
2) QUERY:
A query is a question that has to be asked to get the required data.
Access gathers data that answers the question from one or more table.
The data that make up the answer is either dynaset (if you edit it) or a
snapshot (it cannot be edited).Each time we run a query, we get latest
information in the dynaset. Access either displays the dynaset or snapshot
for us to view or perform an action on it, such as deleting or updating.
3) FORMS:
A form is used to view and edit information in the database record. A
form displays only the information, we want to see in the way we want to
see it. Forms use the familiar controls such as textboxes and checkboxes.
This makes viewing and entering data easy.We can work with forms in
several views. Primarily there are two views,
19
They are,
a) Design View
b) Form View
a) Design View
To build or modify the structure of a form, we work in form’s design
view. We can add control to the form that are bound to fields in a table or
query, includes textboxes, option buttons, graphs and pictures.
b) Form View
The form view displays the whole design of the form.
4) REPORT:
A report is used to view and print the information from the database.
The report can ground records into many levels and compute totals and
average by checking values from many records at once. Also the report is
attractive and distinctive because we have control over the size and
appearance of it.
5) MACRO:
A macro is a set of actions. Each action in a macro does something,
such as opening a form or printing a report .We write macros to automate
the common tasks that work easily and save the time.
20
SYSTEM TESTING
&
21
MAINTANENCE
6. SYSTEM TESTING & MAINTANENCE
6.1 TESTING :
6.1.1 SYSTEM TESTING:
Testing is done for each module. After testing all the modules,
the modules are integrated and testing of the final system is done with the
test data, specially designed to show that the system will operate
successfully in all conditions. The procedure level testing is made first. By
giving improper inputs, the errors occurred are noted and eliminated. Thus
the system testing is a confirmation that everything is correct and an
opportunity to show the user that the system works. The final step involves
Validation testing, which determines whether the software, functions as the
user expected. The end-user rather than the system developer conducts this
test.
Most software developers has a process called “Alpha and Beta test”
to uncover those, that only the end user seems able to find. This is the final
step in system life cycle. Here we implement the tested error-free system
into real-life environment and make necessary changes, which runs in an
online fashion. Here system maintenance is done every months or year based
on company policies, and is checked for errors like runtime errors, long run
errors and other maintenances like table verification and reports.
6.1.2 UNIT TESTING:
22
Unit testing verifies the smallest unit of software design
module. This is known as “Module Testing”. The modules are tested
separately. This testing is carried out during programming stage itself. In
these testing steps, each module is found to be working satisfactorily as
regard to the expected output from the module.
6.1.3 INTEGRATION TESTING:
Integration testing is a systematic technique for constructing
tests to uncover error associated within the interface. In the project, all the
modules are combined and then the entire program is tested as a whole. In
the integration-testing step, all the errors uncovered is corrected for the next
testing steps.
6.1.4 VALIDATION TESTING:
To uncover functional errors, that is, to check whether
functional characteristics confirm to specification or not.
6.2 SYSTEM MAINTANANCE :
The objective of this maintenance work is to make sure that the
system gets into work all time, without any bugs. Provision must be made
for environmental changes which may affect the computer or software
system. This is called the maintenance of the system. Nowadays there is
rapid change in the software world. Due to this rapid change, the system
should be capable of adapting these changes. Maintenance plays a vital role.
The system should be designed to favor all new changes. Doing this will not
affect the system’s performance or its accuracy.
23
CONCLUSION
24
&
FUTURE ENHANCEMENT
7. CONCLUSION & FUTURE ENHANCEMENT
7.1 CONCLUSION :
Our approach requires a minimal number of communications over the
network and provides tunable parameters to maximize performance for
various network topologies.
Our approach provides a powerful technique for approximating the
aggregates of various topologies and data clustering, but comes with
limitations based upon a given topologies structure and connectivity.
For topologies with very distinct clusters of peers, it becomes
increasingly difficult to accurately obtain random samples due to the
inability of random-walk process to quickly reach all clusters.
7.2 FUTURE ENHANCEMENT :
The APPROXIMATE QUERY PROCESSING may be enhanced to
EXACT query processing, which at the present poses many difficulties
because of the use of Unstructured network instead of a Structured one and
also because of congestion, high latency and difficulty posed while
frequently joining or leaving the network without prior information.
The Approximation of query processing technique used in this project
decreases the latency, which is one of the major considerations compared to
accuracy.
25
SNAPSHOTS
26
8. SNAPSHOTS
FIGURE 8.1 Main Form
27
FIGURE 8.1 Register
28
FIGURE 8.3 Login
29
FIGURE 8.4 PeerLister
30
FIGURE 8.5 SQL CONNECTED PEERS
31
FIGURE 8.6 Two Phase Sampling
32
FIGURE 8.7 Random Nodes
33
FIGURE 8.8 View Table & Fields
34
FIGURE 8.9 Aggregation Rules
35
FIGURE 8.10
36
FIGURE 8.11
37
FIGURE 8.12
38
FIGURE 8.13 Report
39
FIGURE 8.14
40
FIGURE 8.15 Error Rate
41
FIGURE 8.15
42
TABLES :
Table 1 Table register in aqp Database
Column Name Data Type Length
uname varchar 50
upwd varchar 50
Table 2 Table peers in aqp Database
Column Name Data Type Length
pid int 4
peername varchar 50
Table 3 Table visitpeers in aqp Database
Column Name Data Type Length
vpid int 4
npname varchar 50
Table 4 Table unvisitpeers in aqp Database
43
Column Name Data Type Length
vpid int 4
npname varchar 50
Table 5 Table revisit in aqp Database
Table 6 Table apValue in aqp Database
Column NameData Type Length
pid int 4
vpname varchar 50
prob nvarchar 50
fname varchar 50
aggregate varchar 50
Table 7 Table errorrate in aqp Database
Column Name Data Type Length
probvarchar 50
sprob varchar 50
Column Name Data Type Length
vpid int 4
vpname varchar 50
res varchar 50
stime varchar 50
etime varchar 50
resptime varchar 50
44
err varchar 50
45
REFERENCES
sREFERENCES:
[1] S. Acharya, P.B. Gibbons, and V. Poosala, “Aqua: A Fast DecisionSupport System Using Approximate Query Answers,” Proc. 25thInt’l Conf. Very Large Data Bases (VLDB ’99), 1999.
[2] L. Adamic, R. Lukose, A. Puniyani, and B. Huberman, “Search inPower-Law Networks” Physical Rev. E, 2001.
[3] B. Babcock, S. Chaudhuri, and G. Das, “Dynamic Sample Selectionfor Approximate Query Processing” Proc. 22nd ACM SIGMODInt’l Conf. Management of Data (SIGMOD ’03), pp. 539-550, 2003.
[4] A.R. Bharambe, M. Agrawal, and S. Seshan, “Mercury: SupportingScalable Multi-Attribute Range Queries” Proc. ACM Ann. Conf.Applications, Technologies, Architectures, and Protocols for ComputerComm. (SIGCOMM ’04), 2004.
[5] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Analysis andOptimization of Randomized Gossip Algorithms” Proc. 43rd IEEEConf. Decision and Control (CDC ’04), 2004.
[6] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Gossip andMixing Times of Random Walks on Random Graphs” Proc. IEEEINFOCOM ’05, 2005.
[7] M. Charikar, S. Chaudhuri, R. Motwani, and V. Narasayya,“Towards Estimation Error Guarantees for Distinct Values” Proc.19th ACM Symp. Principles of Database Systems (PODS ’00), 2000.
[8] S. Chaudhuri, G. Das, M. Datar, R. Motwani, and V. Narasayya,“Overcoming Limitations of Sampling for Aggregation Queries”Proc. 17th IEEE Int’l Conf. Data Eng. (ICDE ’01), pp. 534-542, 2001.
46
47