Date post: | 08-Jan-2018 |
Category: |
Documents |
Upload: | marian-campbell |
View: | 216 times |
Download: | 1 times |
Network-Aware Query Processing for Stream-based Application
Yanif Ahmad, Ugur Cetintemel-Brown University
VLDB 2004
One-line Comments This paper is addressing the operator placement
problem in distributed query processing by using network latency information
Contents Motivation Problem Solution Approach Central Version of Algorithm
Edge Edge+ In-Network latency Constrained
Distributed Version of Algorithm Experiment Critique
Motivation Small scale query processing system: Not-
scalable A lot of data stream & query request
Widely-distributed query processing
Problem Operator placement problem
Operators in query processing trees should be dispersed into the network
O00
O10 O11
O20 O22O21 O23 O25O24 O26
O00
Processing tree (query plan) IP network
O10
O11
O22 O23
O26O25
O20 O21
O24
operator nodeApplication node
Problem : formalized version Operator placement problem
For efficient operator placement Cost: Bandwidth
AOT , EVG ,
Aa
ac )(min
O: operatorsA: their connected inputs & outputsV: nodesE: their linksC(): link cost, bandwidth
• c(a)=0 if for a=(m,n) :• Source (operator’s) locations are determined
nm
m
n
a)(m
)(nc(a)
Solution Approach Network-aware operator placement algorithms
Edge Consider only sources and the proxy location
Edge+ Edge with pair-wise server communication latencies
In-Network Sources, proxy, a subset of all locations
Latency-bound algorithm
Contents Motivation Problem Solution Approach Central Version of Algorithm Distributed Version of Algorithm Experiment Critique
Algorithm Design Principle Naïve algorithm for operator placement
Calculate all the combination of possible mapping => Too complex
Greedy algorithm Calculate only for the locations of having high possibility Locate operators in post-order When we put a operator at a location, we can move by its children
Processing tree
O00
O10 O11
O20 O22O21 O23 O25O24 O26
IP network
operator nodeApplication node
S0 S1
Mapping Function
),(minarg)()(
ovoov
)},(),,()),((min{),(),()}(:{)}(:{
iicvc
iicvc
i cvcoccccvoviiii
O
O10 O12O11
O20 O22O21 O23 O25O24 O26 O27 O29O28
Edge Location candidate: sources, proxy Candidate with high possibility
(1) One of children’s locations (2) A common location (3) Proxy’s location
Link cost
otherwisenm
nmifnmc
),()()(0
),(
< : Tree cost >),( nm
Edge (1) One of children’s locations
A location that maximizes the total tree cost between the operator and all of its children
})(:{
),(maxarg)`(1 vcc
iv
ii
ni
coo
O00
O10 O12O11
O20 O22O21 O23 O25O24 O26 O27 O29O28
S0 S0 S1 S0 S1 S1 S2 S0 S1 S1
S0 S1
O10
O20 O22O21
30 50 20
Processing tree
Edge (2) A common location Idea
Placing an operator and its children at a common location -> zero overlay cost between the operator and its children
Common location (cl) Good place for all its children -> an intersection of each child’s dl (the set of descendant leaf locations)
O00
O10 O12O11
O20 O22O21 O23 O25O24 O26 O27 O29O28
S0 S0 S1 S0 S1 S1 S2 S0 S1 S1
)()()(lodl
oleavesl
)()(
1 ii
ncdlocl
dl(O11)={S0, S1, S2}cl(O00)={S0, S1 }
Edge (3) Proxy’s location Idea
If tree costs are higher near the root -> proxy location, r
O00
O10 O12O11
O20 O22O21 O23 O25O24 O26 O27 O29O28
S0 S0 S1 S0 S1 S1 S2 S0 S1 S1
Edge – Summary Summary
otherwiseovTleavesoifoDHT
oroclv
),(minarg)()(
)(}),`({
)},(),,()),((min{),(),()}(:{)}(:{
iicvc
iicvc
i cvcoccccvoviiii
Edge+
otherwisenmdnm
nmifnmc
))(),((),()()(0
),(
Location candidate: sources, proxy Edge with network latency (d) between two locations Link cost
Mapping function
otherwiseovTleavesoifoDHT
oroclv
),(minarg)()(
)(}),`({
)},(),,()),((min{),(),()}(:{)}(:{
iicvc
iicvc
i cvcoccccvoviiii
In-Network Placement Location candidate : arbitrary locations
(including sources and proxy) Overlay cost and mapping function is the same
as Edge+
otherwisenmdnm
nmifnmc
))(),((),()()(0
),(
)},(),,()),((min{),(),()}(:{)}(:{
iicvc
iicvc
i cvcoccccvoviiii
Problem: reducing the candidate location set
otherwiseovTleavesoifoDHT
oov
),(minarg)()(
)()(
In-Network Placement Approach
Remove the location unless its distance to all current child placements is less than all pairwise distances between child placements
.,:{)( CccVvo jii ))(),(())(,( jiii ccdcvd
))}(),(())(,( jiji ccdcvd
O00
O10 O12O11
O11
O12
O10
O00
40
30
20
50
60
30
N2
N4
N7N8
Latency-Constrained Placement Find the configuration satisfying the latency-constrained Latency-constrained
o
ci
O20 O22O21
S0 S0 S1 S0 S1 S1 S2 S0 S1 S1
P: a set of leaf-to-root pathsPplbadpba
))(),((),(
}))(,()(:)({)( lcvdcovoL ii
pbasubtreepathsp
bado),())0((
))(),((max)(
otherwiseovTleavesoifoDHT
oLv
),(minarg)()(
)(
ci
O
O20
50
30
30N4
N7
S1
O22
O21
S0
O20N5
If l=75
Contents Motivation Problem Solution Approach Central Version of Algorithm Distributed Version of Algorithm Experiment Critique
Distributed Query Placement Reason
Centralized approach – not scalable Substantial network state Algorithm complexity
Distributed Query Placement
O1 C1
C2
C3
C4
O2
O3 O4
Processing tree
Application proxy Partition a processing tree into subtrees (zones) Assign each zone to a coordinator node
Distributed Query Placement
C1
C2
C3
C4
Tree Overlay
Experiment Experimental Setup
Processing Tree Binary tree Depth: 3 ~ 5
Network Topology Max pair-wise path delay: 500ms
Server and proxy location Uniform: APD = ASD Star: APD = 0.5*ASD Cluster: APD = 2*ASD
APD: Average Proxy DistanceASD: Average Server Distance
Server Proxy, UniformProxy, ClusterProxy, Star
Experiment Latency constraints
120ms (0.9nd, tight delay) vs. 300ms (2.2nd, loose delay)
Direct comparison Baseline case: all operators are located at the proxy
Result
Bandwidth consumption Latency stretch
Critique Pros
Operator placement problem Focus on network-related cost not processing cost (BW, latency)
Cons High complexity algorithm possible to apply?
Heavy processing Too much time taken to complete the placement
Latency information of many places is needed Sequential convergence in a bottom-up manner
=> impossible to use in case of complex query plan & topology => more simple algorithm is appropriate
Dynamic? Unresilient to Dynamic topology change
In case of node leave, latency change