Facility Location in Dynamic Geometric Data Streams
Christiane LammersenChristian Sohler
Dynamic Geometric Data Streams• Streams of geometric data arise in
– Mobile networks– Sensor networks– …
• Continuously changing data– Mobile networks: position of nodes– Sensor networks: measured data
• Communication in form of update operations– Update consists of ID of node, old value, new value
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 2
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 333
Hierarchical Communication Systems
• upper layer offers lower layer a certain service• each node can be a server• cost for server ↔ access time
3
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 4
Hierarchical Communication Systems
• upper layer offers lower layer a certain service• each node can be a server• cost for server ↔ access time
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 5
Dynamic Geometric Data Streams
• m insert and delete operations• points in low-dimensional, discrete space
{1, ..., }d
• polylog(, m) memory space, one pass
[Indyk ‘04]
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 666
Dynamic Uniform FLP• point set P• facilities have uniform opening cost f • clients have uniform demand b• goal: maintaining F P, so as to minimize
6
Pp
Fq qpbFf min
FLP related to k-Median but|F| can be (|P|) problem in streaming approximation of the cost
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 777
Related Work• P. Indyk: Algorithms for Dynamic Geometric
Problems over Data Streams, STOC 04– O(log2)-approximation for cost of FLP– Idea: nested squared grids, open facility in all
heavy cells
• G. Frahling and C. Sohler: Coresets in Dynamic Geometric Data Streams, STOC 05– space partition based on heavy cells
7
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 8
Construction of Our Streaming Method
deterministic method
Edet(P) = (OPT(P))
randomized methodErand(P) = (Edet(P))
streaming methodEstream(P) = (Erand(P))
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets
• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within
cells of that size => estimator for cost:
[Indyk ’04, Frahling and Sohler ‘05]
9
Deterministic Method
log
0det 2
i iSPC
iCnPE
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets
• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within
cells of that size => estimator for cost:
10
Deterministic Method
log
0det 2
i iSPC
iCnPE
Idea: Open one facility in each heavy cell in the space partition.
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets
• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within
cells of that size => estimator for cost:
11
Deterministic Method
log
0det 2
i iSPC
iCnPE
Idea: Open one facility in each heavy cell in the space partition.
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 12
Nested Grids
• Impose log()+1 nested squared grids
= 16Level: 4
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 13
Nested Grids
• Impose log()+1 nested squared grids
= 16Level: 3
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 14
Nested Grids
• Impose log()+1 nested squared grids
= 16Level: 2
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 15
Nested Grids
• Impose log()+1 nested squared grids
= 16Level: 1
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 16
Nested Grids
• Impose log()+1 nested squared grids
= 16Level: 0
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 17
Deterministic Method• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within
cells of that size => estimator for cost:
log
0det 2
i iSPC
iCnPE
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 18
Space Partition
• In each grid, identify the heavy cells• Partition the input space based on the heavy cells
f = 8= 16Level: 4
Cell in level i is heavy if it contains f / 2i points.
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 19
Space Partition
• In each grid, identify the heavy cells• Partition the input space based on the heavy cells
f = 8= 16Level: 3
Cell in level i is heavy if it contains f / 2i points.
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 20
Space Partition
• In each grid, identify the heavy cells• Partition the input space based on the heavy cells
f = 8= 16Level: 3
Cell in level i is heavy if it contains f / 2i points.
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 21
Space Partition
• In each grid, identify the heavy cells• Partition the input space based on the heavy cells
f = 8= 16Level: 3
Cell in level i is heavy if it contains f / 2i points.
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 22
Space Partition
• In each grid, identify the heavy cells• Partition the input space based on the heavy cells
f = 8= 16Level: 3
Cell in level i is heavy if it contains f / 2i points.
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 23
Space Partition
• In each grid, identify the heavy cells• Partition the input space based on the heavy cells
f = 8= 16Level: 2
Cell in level i is heavy if it contains f / 2i points.
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 24
Space Partition
• In each grid, identify the heavy cells• Partition the input space based on the heavy cells
f = 8= 16Level: 2
Cell in level i is heavy if it contains f / 2i points.
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 25
Space Partition
• In each grid, identify the heavy cells• Partition the input space based on the heavy cells
f = 8= 16Level: 2
Cell in level i is heavy if it contains f / 2i points.
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 26
Space Partition
• In each grid, identify the heavy cells• Partition the input space based on the heavy cells
f = 8= 16Level: 2
Cell in level i is heavy if it contains f / 2i points.
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 27
Space Partition
• In each grid, identify the heavy cells• Partition the input space based on the heavy cells
f = 8= 16Level: 1
Cell in level i is heavy if it contains f / 2i points.
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 28
Space Partition
• In each grid, identify the heavy cells• Partition the input space based on the heavy cells
f = 8= 16Level: 1
Cell in level i is heavy if it contains f / 2i points.
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 29
Space Partition
• In each grid, identify the heavy cells• Partition the input space based on the heavy cells
f = 8= 16Level: 1
Cell in level i is heavy if it contains f / 2i points.
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 30
Space Partition
• In each grid, identify the heavy cells• Partition the input space based on the heavy cells
f = 8= 16Level: 0
Cell in level i is heavy if it contains f / 2i points.
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 31
Space Partition
• In each grid, identify the heavy cells• Partition the input space based on the heavy cells
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 32
Deterministic Method• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within
cells of that size => estimator for cost:
log
0det 2
i iSPC
iCnPE
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 33
Cost Estimator
• For each cell size, count the number of points within cells of that size => estimator for cost:
log
0det 2
i iSPC
iCnPE
020
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 34
Cost Estimator
• For each cell size, count the number of points within cells of that size => estimator for cost:
log
0det 2
i iSPC
iCnPE
020
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 35
Cost Estimator
• For each cell size, count the number of points within cells of that size => estimator for cost:
log
0det 2
i iSPC
iCnPE
10 2920 9 points
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 36
Cost Estimator
• For each cell size, count the number of points within cells of that size => estimator for cost:
10 2920
log
0det 2
i iSPC
iCnPE
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 37
Cost Estimator
• For each cell size, count the number of points within cells of that size => estimator for cost:
10 2920 46
272920 210
log
0det 2
i iSPC
iCnPE7 points
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 38
Value of Cost Estimator is (OPT(P))
iii CndCn 222
• Contribution of heavy cell C in level i is at most
• Contribution of light cell C in level i is at most
ii CndCnf 22
• A heavy cell in level i contains ( f / 2i) points.• The space partition is balanced.• The distance of a cell in level i to heavy cell is O(2i).
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 39
Value of Cost Estimator is O(OPT(P))
• Contribution of distant cell C in level i is at least n(C) .2i-1
• OPT(P) f . |FOPT|• Estimated cost for near cell C in level i is n(C) .2i = O( f )• There is a constant number of near cells.• Estimated cost for near cells is O( f . |FOPT|)
level i
radius 2i-1
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 40
Deterministic Method• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within
cells of that size => estimator for cost:
log
0det 2
i iSPC
iCnPE
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 41
Randomized MethodIdea:
– Heavy cell in level i contains at least f /2i points– Sample a point in level i with probability 2i/f
Problem: coin flips & delete operationsSolution:
– Hash function hi : { 1,…, }d → { 1,…, f / 2i }
– Sample set Si = { p P | hi( p) = 1 }
1 2 3 4 if 2
…hi
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 42
Randomized Methodfor each level i do
F(i) set of all marked cells C in level i such thata) no subcell of C is markedb) no smaller cell within a distance of less than 2i-1
is marked
return
log
0
)(i
rand iFfPE
Erand(P) = (Edet(P))
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 43
Idea: Reduction to counting distinct elements
Implementation:- For each level i count distinct elements in
DE1(i) = {C|C is in level i and marked}{C|C is in level i and a) or b) fails}
and DE2(i) = {C|C is in level i and a) or b) fails}
- Output difference as cost for level i
Streaming Method
DE1(i)
DE2(i)
DE1(i+1)
DE2(i+1)
IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 444444
Conclusion & Future Work
Streaming Algorithm for Dynamic FLP:• constant factor approximation of cost• update-time: O(log(1/) . polylog())
• space : O(log(1/) . polylog())
• failure probability:
Future Work:• approximation factor not exponential in d• (1+)-approximation algorithm
44
Thank you for your attention!
Department of Computer ScienceTechnische Universität DortmundOtto-Hahn-Str. 1444221 Dortmund, Germany
Phone: +49 231 755-4762 Fax.: +49 231 755-2047 Email: [email protected]://ls2-www.cs.uni-dortmund.de/~lammersen/