Date post: | 06-Apr-2018 |
Category: |
Documents |
Upload: | prabakar663 |
View: | 225 times |
Download: | 0 times |
of 14
8/3/2019 66.Query Planning for Continuous Aggregation
1/14
IEEE TRANSACTIONS ON JOURNAL KNOWLEDGE AND DATA ENGINEERING, MANUSCRIPT ID 1
Query Planning for Continuous AggregationQueries over a Network of Data Aggregators
Rajeev Gupta, and Krithi Ramamritham, Fellow IEEE
AbstractContinuous queries are used to monitor changes to time varying data and to provide results useful for online
decision making. Typically a user desires to obtain the value of some aggregation function over distributed data items, for
example, to know value of portfolio for a client; or the AVG of temperatures sensed by a set of sensors. In these queries a client
specifies a coherency requirement as part of the query. We present a low-cost, scalable technique to answer continuous
aggregation queries using a network of aggregators of dynamic data items. In such a network of data aggregators, each data
aggregator serves a set of data items at specific coherencies. Just as various fragments of a dynamic web-page are served by
one or more nodes of a content distribution network, our technique involves decomposing a client query into sub-queries and
executing sub-queries on judiciously chosen data aggregators with their individual sub-query incoherency bounds. We provide a
technique for getting the optimal set of sub-queries with their incoherency bounds which satisfies client querys coherency
requirement with least number of refresh messages sent from aggregators to the client. For estimating the number of refresh
messages, we build a query cost model which can be used to estimate the number of messages required to satisfy the client
specified incoherency bound. Performance results using real-world traces show that our cost based query planning leads to
queries being executed using less than one third the number of messages required by existing schemes.
Index TermsAlgorithms, Continuous queries,Distributed query processing, Data dissemination, Coherency, Performance.
1 INTRODUCTION
xxxx-xxxx/0x/$xx.00 200x IEEE
Digital Object Indentifier 10.1109/TKDE.2011.12 1041-4347/11/$26.00 2011 IEEE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
8/3/2019 66.Query Planning for Continuous Aggregation
2/14
2 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID
1.1 Aggregate Queries and their Execution
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
8/3/2019 66.Query Planning for Continuous Aggregation
3/14
GUPTA ET AL.: QUERY PLANNING FOR CONTINUOUS QUERIES IN DYNAMIC DATA DISSEMINATION NETWORKS 3
1.2 Problem Statement and Contributions
))(()(1
==
qn
iqiqiq wtvtV
qiw
qqi
n
iqiqi Cwtutv
q
=
|))()((|1
=
qn
iqqiqi CwC
1
)(
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
8/3/2019 66.Query Planning for Continuous Aggregation
4/14
4 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID
1.3 Outline of the Paper
2 DATA DISSEMINATION COST MODEL
2.1 Incoherency Bound Model
2/1)|)()(|( CCtutvP >
Figure 1. Number of pushes vs. incoherency bounds
Symbols Description
A Set of aggregators in the network.
N Number of data aggregators (DAs).
D Set of data items disseminated by the network.
C Incoherency bounds of data items.
ak kth data aggregator, 1kN
Dk Set of data items disseminated by the kth DA.
dkj jth data item disseminated by the k
th DA.
tkj Incoherency bound which akcan ensure for dkj.
q Client query.
Cq Incoherency bound for q.
nq Number of data items in q.
dqi ith data item of the query q.
vqi(t) Value of the ith data item of the query q at time t.
wqi Weight of the data item dqi for the query q.
Vq(t) Value of the query q at time t.
qk Sub-query ofq to be executed at ak .
Cqk Incoherency bound ofqk.
Rq Sumdiffof the query q.
Correlation measure between data items
Query satisfiability parameter
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
8/3/2019 66.Query Planning for Continuous Aggregation
5/14
GUPTA ET AL.: QUERY PLANNING FOR CONTINUOUS QUERIES IN DYNAMIC DATA DISSEMINATION NETWORKS 5
2.2 Data Dynamics Model
= iiis ssR || 1
2.3 Combining Data Dissemination Models
(a) C=0.001 (b) C=0.01 (c) C=0.1
Figure 2. Number of pushes vs. data sumdiff
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
8/3/2019 66.Query Planning for Continuous Aggregation
6/14
6 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID
3 COST MODEL FOR ADDITIVE AGGREGATIONQUERIES
|||| 11 +=+= iiqiipqqppdata qqwppwRwRwR
|)()(| 11 += iiqiipquery qqwppwR
3.1 Modeling Correlation between Data Dynamics
)2( 22222 qqppqqppquery RwRwRwRwR ++
))()(/()))(((2
12
111 = iiiiiiii qqppqqpp
3.2 Query based Normalization
)2/()2(2222222
qpqpqqppqqppquery
wwwwRwRwRwRwR ++++=
qpqp wwww 2/122 ++
+
+
=
= = =
= ==
q q q
q qq
n
i
n
i
n
ijjqjqiijqi
n
i
n
ijjjiqjqiij
n
iiqi
Q
www
RRwwRw
R
1 1 ,1
2
1 ,11
22
2
3.3 Validating the Query Cost Model
4 QUERY PLANNING FOR WEIGHTEDADDITIVE AGGREGATION QUERIES
Figure 3: Query cost validation with varying (a) Sumdiff (b) Incoherency bound
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
8/3/2019 66.Query Planning for Continuous Aggregation
7/14
GUPTA ET AL.: QUERY PLANNING FOR CONTINUOUS QUERIES IN DYNAMIC DATA DISSEMINATION NETWORKS 7
kjt
kj
d
qiqidw qiw qid
2qkC
==
N
k qk
qk
qC
RZ
12
qkid
qkid
qqk CC
kjt kjd
)( kjn
qiqjqiqk ddtwT
qk
=
qkqk TC
4.1 Finding Optimal Query Plan is NP-hard
4.2 Optimal Allocation of Query IncoherencyBound among Sub-queries
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
8/3/2019 66.Query Planning for Continuous Aggregation
8/14
8 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID
+==
N
kqqk
N
kqkqk CCCR
11
2 )()/(
qkC
==
N
kqkqkqqk RRCC
1
3/13/1 )/(
3/1qkR
4.3 Greedy Heuristics for Deriving the Sub-queries
4.3.1 Minimum Cost Heuristic
3/1
kR
==
N
kqk
q
q RC
Z1
3/1
3/2
3/1 1
3/1
qkR
3/1mR
qkqk TC
4.3.2 Satisfiability of sub-query incoherency bound
3/1qkR
Figure 4: Greedy algorithm for query plan selection
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
8/3/2019 66.Query Planning for Continuous Aggregation
9/14
GUPTA ET AL.: QUERY PLANNING FOR CONTINUOUS QUERIES IN DYNAMIC DATA DISSEMINATION NETWORKS 9
)(3/1
3/1
mq
mm
RC
TR
+
3/1/ mqm RCT
3/1mqRC
4.3.3 Maximum Gain Heuristic
iidw
122
+
=
i j ijijiijii
iii
m
RRwwRw
Rw
G
3/1
'
)(
mq
iii
mmRC
Tw
GG
=
5 PERFORMANCE EVALUATION
5.1 Comparison of Algorithms
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
8/3/2019 66.Query Planning for Continuous Aggregation
10/14
10 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID
5.2 Effects of Algorithmic Parameters
5.2.1 Effect of data dynamics
5.2.2 Effect of correlation between data dynamics
Figure 5: Performance evaluation of algorithms
(a) Query size=3 (b) Query size=5
Figure 6: Effect of data sumdiffon sub-query size
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
8/3/2019 66.Query Planning for Continuous Aggregation
11/14
GUPTA ET AL.: QUERY PLANNING FOR CONTINUOUS QUERIES IN DYNAMIC DATA DISSEMINATION NETWORKS 11
5.2.3 Effect of query satisfiability parameter
5.3 Overheads of Query Planning
6 QUERY PLANNING FOR MAX QUERIES
)1),(max()( qqiq nitvtV =
qqi niiCC 1,,
(a) Comparison of algorithm (b) Effect ofdata dynamics orderon performance
Figure 7: Effect of on query satisfiability Figure 8: Performance of MAX queries
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
8/3/2019 66.Query Planning for Continuous Aggregation
12/14
12 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID
6.1 Query Cost Model
>= =
q qn
i
n
ijjqijiiq niRxxpRR
1 ,1
)1|max())((
6.2 Optimized Execution
6.2.1 Optimal query planning problem is NP-hard
6.2.2 Greedy Heuristics
=qid
diqdiq RRG )max(
6.2.3 Simulation results
7RELATED WORK
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
8/3/2019 66.Query Planning for Continuous Aggregation
13/14
GUPTA ET AL.: QUERY PLANNING FOR CONTINUOUS QUERIES IN DYNAMIC DATA DISSEMINATION NETWORKS 13
8DISCUSSION &CONCLUSION
o
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
8/3/2019 66.Query Planning for Continuous Aggregation
14/14
14 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID
o
o
REFERENCES
[8] Query cost model validation for sensor data.www.cse.iitb.ac.in/~grajeev/sumdiff/RaviVijay_BTP06.pdf.
Rajeev Gupta got his BTech from Indian Institute of Technology (IIT)Kharagpur, India in Electronics Engineering. He is currently pursuinghis PhD from IIT Mumbai, India in Computer Science. He is workingas Researcher at IBM Research, New Delhi, India for last 10 years.Krithi Ramamritham received the PhD in Computer Science fromUniversity of Utah and then joined the University of Massachusetts.He is currently at IIT Bombay as Professor in the Department ofComputer Science. He is a fellow of IEEE and a fellow of ACM. Hehas served on numerous program committees of conferences andworkshops. His editorial board contributions include IEEE Transac-tions, the Real Time Systems Journal, and the VLDB Journal.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.