Date post: | 11-Jan-2016 |
Category: |
Documents |
Upload: | gladys-stokes |
View: | 219 times |
Download: | 2 times |
PIC: Practical Internet Coordinates for Distance
Estimation
Manuel Costa
joint work with
Miguel Castro, Ant Rowstron, Peter Key
Microsoft Research Cambridge
Why estimate distances?
Why estimate distances?
• Distance estimation can be used to optimize large scale distributed systems:– Server selection– Locality aware peer-to-peer overlay networks– Application level multicast
• Problems with on-demand measurement:– Slow– High overhead
PIC
• Maps the Internet into a geometric space
• Allows very low cost distance estimation
• Fully decentralized
• Tolerates malicious nodes
Outline
• Estimating distances with coordinates
• Securing the coordinate computation process
• Application to peer-to-peer overlays
• Conclusion
Internet as a geometric space
• Map each node to a position in the geometric space
• Compute distances based on coordinates
• Any node can compute the distance between any other two nodes
• Proposed by GNP (Global Network Positioning)
y
x
(x2,y2)
(x3,y3)
(x1,y1)
GNP – computing coordinates
• Measure distance to fixed landmarks
• Assign coordinates by solving a multi-dimensional global minimization problem
• There is no exact solution:– Internet is not euclidean– Measurements have errors
y
x
(x1,y1)
(x2,y2)
(x3,y3)
(x4,y4)
d1 d2
d3
PIC – computing coordinates
• Any node in the system can act as a landmark
• Strategies for choosing landmarks include:– Random nodes– Close nodes– Hybrid
y
x
(x1,y1) (x2,y2)
(x3,y3)
(x4,y4)
(x5,y5)
d1
d2
d3
PIC – any node can act as landmark
PIC – advantages
• Self-organizing - no provisioning of servers needed
• Scalable - load distributed among all the peers
• Resilient - avoids centralized points of failure
Experimental evaluation
• 40 000 node network on 3 topologies: Georgia Tech, Mercator, Corpnet
• Compare predicted distance to real distance for 100 000 node pairs
• Euclidean space with 8 dimensions, 16 landmarks
Accuracy: Georgia Tech
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100 120 140 160 180 200
relative error (%)
frac
tio
n o
f d
ista
nce
s
GNPrandomclosesthybrid
Accuracy over short distances
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100 120 140 160 180 200
relative error (%)
frac
tio
n o
f d
ista
nce
s
randomGNPclosesthybrid
Accuracy: CorpNet
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100 120 140 160 180 200
Relative error (%)
Fra
ctio
n o
f d
ista
nce
s
GNP
randomclosest
hybrid
Accuracy: Mercator
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100 120 140 160 180 200
Relative error (%)
Fra
ctio
n o
f d
ista
nce
s
GNPrandomclosesthybrid
PIC – security
• Problem: Malicious/compromised nodes can provide incorrect coordinates or fake distances
• Solution– Incorrect coordinates and distances
are likely to violate triangle inequality– Remove landmarks that violate triangle
inequality
PIC – security
• Remove landmarks with highest sum of deviations from these bounds
• When testing landmark i, check:
dn,i di,j dn,j≤ +
dn,i di,j dn,j≥ −
dn,i dn,j di,j≥ −joining node n
landmark i(under test)
landmark j
dn,i
dn,j
di,j
Security evaluation• Fraction f of colluding attackers
– Know everything
• When a node joins, attackers collude to provide a set of fake coordinates and distances that maximize the distance to the correct position
• This is a very powerful attack
Accuracy under attack
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100 120 140
relative error (%)
fra
cti
on
of
dis
tan
ce
s
no attackers, security on
10% colluding attackers
20% colluding attackers
Application to peer-to-peer overlays
• Structured overlays:– Nodes have nodeIds – Message sent to a key is delivered to node
with closest nodeId
Structured overlays: Mapping keys to nodes
• large id space (128-bit integers)
• nodeIds picked randomly from space
• keys picked randomly from space
• key is managed by its root node:
• live node with id closest to the key
root nodefor key
id space
nodeIdkey
Pastry: Node routing state
0* 1* 2* 3*
20* 21* 22* 23*
200* 201* 202* 203*
2030* 2031* 2032* 2033*
203231
• topology aware routing table• nodeIds and keys in some base 2b (e.g., 4)• prefix constraints on nodeIds for each slot• pick closest node satisfying slot constraints
leaf set
nodeId
Pastry: routing
• prefix matching: each hop resolves an extra key digit
323310
323211
322021
313221
103231
nodeId
key
route(m,323310)
Proximity neighbour selection
• Select close nodes for use in routing
• Important to achieve low delay routes
• PIC can replace network distance probes
Pastry: prefix-based routing
• Prefix matching: each hop resolves an extra key digit• Proximity neighbour selection: use closest known
node that matches an extra digit
323211322021313221
103231route(m,323310)
route(m,323310)
route(m,323310)
323310
Proximity test variants
• Full probing– RTT measured by taking the minimum of
three probes
• PIC– RTT estimated with coordinates
• Filtered probing– Use coordinates to filter bad candidates,
always probe before replacing a neighbour
Trace-driven evaluation
• Dynamic node arrival and failure generated from UW Gnutella study– 60 hour trace– Average session time 2.3 hours– number of active nodes varies from 1300-
>2700
• Georgia Tech topology
Distance probes
0
0.05
0.1
0.15
0.2
0.25
0.3
0 10 20 30 40 50 60
Time (hours)
Pro
bes
per
sec
on
d p
er n
od
e full probing
PIC
filtered probing
Relative delay penalty
full probing filtered probingPIC
no locality
0
0.5
1
1.5
2
2.5
3
3.5
RD
P
Related Work
• GNP: maps Internet into geometric space using centralized landmarks
• Lighthouses: uses decentralized random landmarks
• Mithos: uses closest nodes as landmarks• Virtual landmarks: partitions nodes into sets,
maps coordinates between sets• Vivaldi: computes coordinates continuously by
passively monitoring RPC delays
Conclusion
• PIC enables practical distance estimation in large distributed systems– Accurate– Self-organizing– Scalable– Secure
• Future Work– Deployment and evaluation on the Internet– Different distance metrics (e.g. bandwidth)
Questions ?