Facility Location in Dynamic Geometric Data Streams

Post on 05-Jan-2016

32 views 1 download

Tags:

description

Facility Location in Dynamic Geometric Data Streams. Christiane Lammersen Christian Sohler. Dynamic Geometric Data Streams. Streams of geometric data arise in Mobile networks Sensor networks … Continuously changing data Mobile networks: position of nodes Sensor networks: measured data - PowerPoint PPT Presentation

transcript

Facility Location in Dynamic Geometric Data Streams

Christiane LammersenChristian Sohler

Dynamic Geometric Data Streams• Streams of geometric data arise in

– Mobile networks– Sensor networks– …

• Continuously changing data– Mobile networks: position of nodes– Sensor networks: measured data

• Communication in form of update operations– Update consists of ID of node, old value, new value

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 2

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 333

Hierarchical Communication Systems

• upper layer offers lower layer a certain service• each node can be a server• cost for server ↔ access time

3

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 4

Hierarchical Communication Systems

• upper layer offers lower layer a certain service• each node can be a server• cost for server ↔ access time

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 5

Dynamic Geometric Data Streams

• m insert and delete operations• points in low-dimensional, discrete space

{1, ..., }d

• polylog(, m) memory space, one pass

[Indyk ‘04]

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 666

Dynamic Uniform FLP• point set P• facilities have uniform opening cost f • clients have uniform demand b• goal: maintaining F P, so as to minimize

6

Pp

Fq qpbFf min

FLP related to k-Median but|F| can be (|P|) problem in streaming approximation of the cost

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 777

Related Work• P. Indyk: Algorithms for Dynamic Geometric

Problems over Data Streams, STOC 04– O(log2)-approximation for cost of FLP– Idea: nested squared grids, open facility in all

heavy cells

• G. Frahling and C. Sohler: Coresets in Dynamic Geometric Data Streams, STOC 05– space partition based on heavy cells

7

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 8

Construction of Our Streaming Method

deterministic method

Edet(P) = (OPT(P))

randomized methodErand(P) = (Edet(P))

streaming methodEstream(P) = (Erand(P))

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets

• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within

cells of that size => estimator for cost:

[Indyk ’04, Frahling and Sohler ‘05]

9

Deterministic Method

log

0det 2

i iSPC

iCnPE

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets

• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within

cells of that size => estimator for cost:

10

Deterministic Method

log

0det 2

i iSPC

iCnPE

Idea: Open one facility in each heavy cell in the space partition.

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets

• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within

cells of that size => estimator for cost:

11

Deterministic Method

log

0det 2

i iSPC

iCnPE

Idea: Open one facility in each heavy cell in the space partition.

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 12

Nested Grids

• Impose log()+1 nested squared grids

= 16Level: 4

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 13

Nested Grids

• Impose log()+1 nested squared grids

= 16Level: 3

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 14

Nested Grids

• Impose log()+1 nested squared grids

= 16Level: 2

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 15

Nested Grids

• Impose log()+1 nested squared grids

= 16Level: 1

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 16

Nested Grids

• Impose log()+1 nested squared grids

= 16Level: 0

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 17

Deterministic Method• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within

cells of that size => estimator for cost:

log

0det 2

i iSPC

iCnPE

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 18

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 4

Cell in level i is heavy if it contains f / 2i points.

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 19

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 3

Cell in level i is heavy if it contains f / 2i points.

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 20

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 3

Cell in level i is heavy if it contains f / 2i points.

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 21

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 3

Cell in level i is heavy if it contains f / 2i points.

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 22

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 3

Cell in level i is heavy if it contains f / 2i points.

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 23

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 2

Cell in level i is heavy if it contains f / 2i points.

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 24

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 2

Cell in level i is heavy if it contains f / 2i points.

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 25

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 2

Cell in level i is heavy if it contains f / 2i points.

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 26

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 2

Cell in level i is heavy if it contains f / 2i points.

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 27

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 1

Cell in level i is heavy if it contains f / 2i points.

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 28

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 1

Cell in level i is heavy if it contains f / 2i points.

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 29

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 1

Cell in level i is heavy if it contains f / 2i points.

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 30

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 0

Cell in level i is heavy if it contains f / 2i points.

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 31

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 32

Deterministic Method• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within

cells of that size => estimator for cost:

log

0det 2

i iSPC

iCnPE

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 33

Cost Estimator

• For each cell size, count the number of points within cells of that size => estimator for cost:

log

0det 2

i iSPC

iCnPE

020

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 34

Cost Estimator

• For each cell size, count the number of points within cells of that size => estimator for cost:

log

0det 2

i iSPC

iCnPE

020

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 35

Cost Estimator

• For each cell size, count the number of points within cells of that size => estimator for cost:

log

0det 2

i iSPC

iCnPE

10 2920 9 points

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 36

Cost Estimator

• For each cell size, count the number of points within cells of that size => estimator for cost:

10 2920

log

0det 2

i iSPC

iCnPE

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 37

Cost Estimator

• For each cell size, count the number of points within cells of that size => estimator for cost:

10 2920 46

272920 210

log

0det 2

i iSPC

iCnPE7 points

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 38

Value of Cost Estimator is (OPT(P))

iii CndCn 222

• Contribution of heavy cell C in level i is at most

• Contribution of light cell C in level i is at most

ii CndCnf 22

• A heavy cell in level i contains ( f / 2i) points.• The space partition is balanced.• The distance of a cell in level i to heavy cell is O(2i).

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 39

Value of Cost Estimator is O(OPT(P))

• Contribution of distant cell C in level i is at least n(C) .2i-1

• OPT(P) f . |FOPT|• Estimated cost for near cell C in level i is n(C) .2i = O( f )• There is a constant number of near cells.• Estimated cost for near cells is O( f . |FOPT|)

level i

radius 2i-1

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 40

Deterministic Method• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within

cells of that size => estimator for cost:

log

0det 2

i iSPC

iCnPE

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 41

Randomized MethodIdea:

– Heavy cell in level i contains at least f /2i points– Sample a point in level i with probability 2i/f

Problem: coin flips & delete operationsSolution:

– Hash function hi : { 1,…, }d → { 1,…, f / 2i }

– Sample set Si = { p P | hi( p) = 1 }

1 2 3 4 if 2

…hi

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 42

Randomized Methodfor each level i do

F(i) set of all marked cells C in level i such thata) no subcell of C is markedb) no smaller cell within a distance of less than 2i-1

is marked

return

log

0

)(i

rand iFfPE

Erand(P) = (Edet(P))

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 43

Idea: Reduction to counting distinct elements

Implementation:- For each level i count distinct elements in

DE1(i) = {C|C is in level i and marked}{C|C is in level i and a) or b) fails}

and DE2(i) = {C|C is in level i and a) or b) fails}

- Output difference as cost for level i

Streaming Method

DE1(i)

DE2(i)

DE1(i+1)

DE2(i+1)

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 444444

Conclusion & Future Work

Streaming Algorithm for Dynamic FLP:• constant factor approximation of cost• update-time: O(log(1/) . polylog())

• space : O(log(1/) . polylog())

• failure probability:

Future Work:• approximation factor not exponential in d• (1+)-approximation algorithm

44

Thank you for your attention!

Department of Computer ScienceTechnische Universität DortmundOtto-Hahn-Str. 1444221 Dortmund, Germany

Phone: +49 231 755-4762 Fax.: +49 231 755-2047 Email: christiane.lammersen@tu-dortmund.dehttp://ls2-www.cs.uni-dortmund.de/~lammersen/