Facility Location in Dynamic Geometric Data Streams

transcript

Christiane LammersenChristian Sohler

Dynamic Geometric Data Streams• Streams of geometric data arise in

– Mobile networks– Sensor networks– …

• Continuously changing data– Mobile networks: position of nodes– Sensor networks: measured data

• Communication in form of update operations– Update consists of ID of node, old value, new value

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 2

Hierarchical Communication Systems

• upper layer offers lower layer a certain service• each node can be a server• cost for server ↔ access time

Hierarchical Communication Systems

• upper layer offers lower layer a certain service• each node can be a server• cost for server ↔ access time

Dynamic Geometric Data Streams

• m insert and delete operations• points in low-dimensional, discrete space

{1, ..., }d

• polylog(, m) memory space, one pass

[Indyk ‘04]

Dynamic Uniform FLP• point set P• facilities have uniform opening cost f • clients have uniform demand b• goal: maintaining F P, so as to minimize

Fq qpbFf min

FLP related to k-Median but|F| can be (|P|) problem in streaming approximation of the cost

Related Work• P. Indyk: Algorithms for Dynamic Geometric

Problems over Data Streams, STOC 04– O(log2)-approximation for cost of FLP– Idea: nested squared grids, open facility in all

heavy cells

• G. Frahling and C. Sohler: Coresets in Dynamic Geometric Data Streams, STOC 05– space partition based on heavy cells

Construction of Our Streaming Method

deterministic method

Edet(P) = (OPT(P))

randomized methodErand(P) = (Edet(P))

streaming methodEstream(P) = (Erand(P))

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets

• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within

cells of that size => estimator for cost:

[Indyk ’04, Frahling and Sohler ‘05]

Deterministic Method

0det 2

i iSPC

0det 2

i iSPC

Idea: Open one facility in each heavy cell in the space partition.

0det 2

i iSPC

Idea: Open one facility in each heavy cell in the space partition.

Nested Grids

• Impose log()+1 nested squared grids

= 16Level: 4

Nested Grids

= 16Level: 3

Nested Grids

= 16Level: 2

Nested Grids

= 16Level: 1

Nested Grids

= 16Level: 0

Deterministic Method• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within

0det 2

i iSPC

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 4

Cell in level i is heavy if it contains f / 2i points.

Space Partition

f = 8= 16Level: 3

Space Partition

f = 8= 16Level: 3

Space Partition

f = 8= 16Level: 3

Space Partition

f = 8= 16Level: 3

Space Partition

f = 8= 16Level: 2

Space Partition

f = 8= 16Level: 2

Space Partition

f = 8= 16Level: 2

Space Partition

f = 8= 16Level: 2

Space Partition

f = 8= 16Level: 1

Space Partition

f = 8= 16Level: 1

Space Partition

f = 8= 16Level: 1

Space Partition

f = 8= 16Level: 0

Space Partition

0det 2

i iSPC

Cost Estimator

• For each cell size, count the number of points within cells of that size => estimator for cost:

0det 2

i iSPC

Cost Estimator

0det 2

i iSPC

Cost Estimator

0det 2

i iSPC

10 2920 9 points

Cost Estimator

10 2920

0det 2

i iSPC

Cost Estimator

10 2920 46

272920 210

0det 2

i iSPC

iCnPE7 points

Value of Cost Estimator is (OPT(P))

iii CndCn 222

• Contribution of heavy cell C in level i is at most

• Contribution of light cell C in level i is at most

ii CndCnf 22

• A heavy cell in level i contains ( f / 2i) points.• The space partition is balanced.• The distance of a cell in level i to heavy cell is O(2i).

Value of Cost Estimator is O(OPT(P))

• Contribution of distant cell C in level i is at least n(C) .2i-1

• OPT(P) f . |FOPT|• Estimated cost for near cell C in level i is n(C) .2i = O( f )• There is a constant number of near cells.• Estimated cost for near cells is O( f . |FOPT|)

level i

radius 2i-1

0det 2

i iSPC

Randomized MethodIdea:

– Heavy cell in level i contains at least f /2i points– Sample a point in level i with probability 2i/f

Problem: coin flips & delete operationsSolution:

– Hash function hi : { 1,…, }d → { 1,…, f / 2i }

– Sample set Si = { p P | hi( p) = 1 }

1 2 3 4 if 2

Randomized Methodfor each level i do

F(i) set of all marked cells C in level i such thata) no subcell of C is markedb) no smaller cell within a distance of less than 2i-1

is marked

return

rand iFfPE

Erand(P) = (Edet(P))

Idea: Reduction to counting distinct elements

Implementation:- For each level i count distinct elements in

DE1(i) = {C|C is in level i and marked}{C|C is in level i and a) or b) fails}

and DE2(i) = {C|C is in level i and a) or b) fails}

- Output difference as cost for level i

Streaming Method

DE1(i)

DE2(i)

DE1(i+1)

DE2(i+1)

Conclusion & Future Work

Streaming Algorithm for Dynamic FLP:• constant factor approximation of cost• update-time: O(log(1/) . polylog())

• space : O(log(1/) . polylog())

• failure probability:

Future Work:• approximation factor not exponential in d• (1+)-approximation algorithm

Thank you for your attention!

Department of Computer ScienceTechnische Universität DortmundOtto-Hahn-Str. 1444221 Dortmund, Germany

Phone: +49 231 755-4762 Fax.: +49 231 755-2047 Email: christiane.lammersen@tu-dortmund.dehttp://ls2-www.cs.uni-dortmund.de/~lammersen/

Facility Location in Dynamic Geometric Data Streams

Documents