Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus...

Post on 26-Mar-2015

214 views 0 download

Tags:

transcript

Tight Bounds for Distributed Functional Monitoring

David Woodruff

IBM Almaden

Qin Zhang

Aarhus University

MADALGO

Distributed Functional MonitoringC

P1 P2 P3 Pk…

coordinator

time

sites

Static case vs. Dynamic caseProblems on x1 + x2 + … + xk: sampling, p-norms, heavy hitters, compressed sensing, quantiles, entropyAuthors: Can, Cormode, Huang, Muthukrishnan, Patt-Shamir, Shafrir, Tirthapura, Wang, Yi, Zhao, many others

CommunicationCommunication

x1 x2 x3 xkinputs:

Updates:xi à xi + ej

Updates:xi à xi + ej

Motivation

• Data distributed and stored in the cloud– Impractical to put data on a single device

• Sensor networks– Communication very power-intensive

• Network routers– Bandwidth limitations

Problems• Which functions f(x1, …, xk) do we care about?

• x1, …, xk are non-negative length-n vectors

• x = i=1k xi

• f(x1, …, xk) = |x|p = (i=1n xi

p)1/p

• |x|0 is the number of non-zero coordinates

What is the randomized communication cost of these

problems?I.e., the minimal cost of a protocol, which for every input, fails with probability < 1/3

Static case, Dynamic Case

What is the randomized communication cost of these

problems?I.e., the minimal cost of a protocol, which for every input, fails with probability < 1/3

Static case, Dynamic Case

Exact Answers

• An (n) communication bound for computing |x|p , p 1

• Reduction from 2-Player Set-Disjointness (DISJ)

• Alice has a set S µ [n] of size n/4

• Bob has a set T µ [n] of size n/4 with either |S Å T| = 0 or |S Å T| = 1

• Is S Å T = ;?• |X Å Y| = 1 ! DISJ(X,Y) = 1, |X Å Y| = 0 !DISJ(X,Y) = 0

• [KS, R] (n) communication

• Prohibitive for applications

Approximate Answers

f(x1, …, xk) = (1 ± ε) |x |p

What is the randomized communication cost as a function of k, ε, and n?

Ignore log(nk/ε) factors

Previous ResultsLower bounds in static model, upper bounds in dynamic

model (underlying vectors are non-negative)

• |x|0: (k + ε-2) and O(k¢ε-2 )

• |x|p: (k + ε-2)

• |x|2: O(k2/ε + k1.5/ε3)

• |x|p, p > 2: O(k2p+1n1-2/p ¢ poly(1/ε))

Our ResultsLower bounds in static model, upper bounds in dynamic

model (underlying vectors are non-negative)

• |x|0: (k + ε-2) and O(k¢ε-2 ) (k¢ε-2)

• |x|p: (k + ε-2) (kp-1¢ε-2). Talk will focus on p = 2

• |x|2: O(k2/ε + k1.5/ε3) O(k¢poly(1/ε))

• |x|p, p > 2: O(k2p+1n1-2/p ¢ poly(1/ε)) O(kp-1¢poly(1/ε))

First lower bounds to depend on

product of k and ε-

2

First lower bounds to depend on

product of k and ε-

2

Upper bound doesn’t depend

polynomially on n

Upper bound doesn’t depend

polynomially on n

Talk Outline

• Lower Bounds– Non-zero elements – Euclidean norm

• Upper Bounds– p-norm

Previous Lower Bounds• Lower bounds for any p-norm, p != 1

• [CMY](k)

• [ABC] (ε-2) • Reduction from Gap-Orthogonality (GAP-ORT)

• Alice, Bob have u, v 2 {0,1}ε-2 , respectively

• |¢(u, v) – 1/(2ε2)| < 1/ε or |¢(u, v) - 1/(2ε2)| > 2/ε

• [CR, S] (ε-2) communication

Talk Outline

• Lower Bounds– Non-zero elements – Euclidean norm

• Upper Bounds– p-norm

Lower Bound for Distinct Elements

• Improve bound to optimal (k¢ε-2)

• Simpler problem: k-GAP-THRESH

– Each site Pi holds a bit Zi

– Zi are i.i.d. Bernoulli(¯)

– Decide if

i=1k Zi > ¯ k + (¯ k)1/2 or i=1

k Zi < ¯ k - (¯ k)1/2

Otherwise don’t care

• Rectangle property: for any correct protocol transcript ¿,

Z1, Z2, …, Zk are independent conditioned on ¿

A Key Lemma• Lemma: For any protocol ¦ which succeeds w.pr. >.9999, the

transcript ¿ is such that w.pr. > 1/2, for at least k/2 different i, H(Zi | ¿) < H(.01 ¯)

• Proof: Suppose ¿ does not satisfy this– With large probability,

¯ k - O(¯ k)1/2 i=1k Zi | ¿] < ¯ k + O(¯ k)1/2

– Since the Zi are independent given ¿, i=1

k Zi | ¿ is a sum of independent Bernoullis

– Since most H(Zi | ¿) are large, by anti-concentration, both events occur with constant probability:

i=1k Zi | ¿ > ¯ k + (¯ k)1/2 , i=1

k Zi | ¿ < ¯ k - (¯ k)1/2

So ¦ can’t succeed with large probability

Composition IdeaC

P1 P2 P3 Pk…

Z3Z2Z1Zk

The input to Pi in k-GAP-THRESH, denoted Zi, is the output of a 2-party Disjointness (DISJ) instance between C and Si

- Let X be a random set of size 1/(4ε2) from {1, 2, …, 1/ε2}- For each i, if Zi = 1, then choose Yi so that DISJ(X, Yi) = 1, else choose Yi so that DISJ(X, Yi) = 0- Distributional complexity (1/ε2) [Razborov]

DISJ

DISJ

DISJDISJ

Can think of C as a

player

Can think of C as a

player

Putting it All Together• Key Lemma ! For most i, H(Zi | ¿) < H(.01¯)

• Since H(Zi) = H(¯) for all i, for most i protocol ¦ solves DISJ(X, Yi) with constant probability

• Since the Zi | ¿ are independent, solving DISJ requires communication (ε-2) on each of k/2 copies

• Total communication is (k¢ε-2)

• Can show a reduction:– |x|0 > 1/(2ε2) + 1/ε if i=1

k Zi > ¯ k + (¯ k)1/2

– |x|0 < 1/(2ε2) - 1/ε if i=1k Zi < ¯ k - (¯ k)1/2

Talk Outline

• Lower Bounds– Non-zero elements – Euclidean norm

• Upper Bounds– p-norm

Lower Bound for Euclidean Norm

• Improve (k + ε-) bound to optimal (k¢ε-2)

• Base problem: Gap-Orthogonality (GAP-ORT(X, Y))– Consider uniform distribution on (X,Y)

• We observe information lower bound for GAP-ORT

• Sherstov’s lower bound for GAP-ORT holds for uniform distribution on (X,Y)

• [BBCR] + [Sherstov] ! for any protocol ¦ and t > 0, I(X, Y; ¦) = (1/(ε2 log t)) or ¦ uses t communication

Information Implications

• By chain rule,

I(X, Y ; ¦) = i=11/ε2 I(Xi, Yi ; ¦ | X< i, Y< i) = (ε-2)

• For most i, I(Xi, Yi ; ¦ | X< i, Y< i) = (1)

• Maximum Likelihood Principle: non-trivial advantage in guessing (Xi, Yi)

2-BIT k-Party DISJ

• Choose a random j 2 [k2]– j doesn’t occur in any Ti

– j occurs only in T1, …, Tk/2

– j occurs only in Tk/, …, Tk

– j occurs in T1, …, Tk

• All j’ j occur in at most one set Ti (assume k ¸ 4)

• We show (k) information cost

P1 P2 … PkP3

T1 T2 T3 Tk 2 [k2]

We compose GAP-ORT with a variant of k-Party DISJ

Rough Composition Idea

2-BIT k-party DISJ instance

2-BIT k-party DISJ instance

2-BIT k-party DISJ instance

{1/ε2

Show (k/ε2) overall information is revealed

Bits Xi and Yi in GAP-ORT determine output of i-th 2-BIT k-party DISJ instance

Bits Xi and Yi in GAP-ORT determine output of i-th 2-BIT k-party DISJ instance

An algorithm for approximating Euclidean norm solves GAP-ORT, therefore solves most 2-BIT k-party DISJ instances

An algorithm for approximating Euclidean norm solves GAP-ORT, therefore solves most 2-BIT k-party DISJ instances

GAP-ORT

- Information adds (if we condition on enough “helper” variables)- Pi participates in all instances

- Information adds (if we condition on enough “helper” variables)- Pi participates in all instances

Talk Outline

• Lower Bounds– Non-zero elements – Euclidean norm

• Upper Bounds– p-norm

Algorithm for p-norm

• We get kp-1 poly(1/ε), improving k2p+1n1-2/p poly(1/ε) for general p and O(k2/ε + k1.5/ε3) for p = 2

• Our protocol is the first 1-way protocol, that is, all communication is from sites to coordinator

• Focus on Euclidean norm (p = 2) in talk

• Non-negative vectors

• Just determine if Euclidean norm exceeds a threshold θ

The Most Naïve Thing to Do

• xi is Site i’s current vector

• x = i=1k xi

• Suppose Site i sees an update xi à xi + ej

• Send j to Coordinator with a certain probability that only depends on k and θ?

Sample and Send

P1 P2 … PkP3

C

1…10…00…0………0…0

0…01…10…0………0…0

0…00…01…1………0…0

………………………………………

0…00…00…0………1…1

|x|2 = k2|x|2 = k2

{k|x|2 = 2k2|x|2 = 2k2

1 1 1 1 1

Send each update with probability at least 1/k

Communication = O(k), so okay

Send each update with probability at least 1/k

Communication = O(k), so okay

Suppose x has k4 coordinates that are 1, and may have a

unique coordinate which is k2, occurring k times on each site

Suppose x has k4 coordinates that are 1, and may have a

unique coordinate which is k2, occurring k times on each site

- Send update with probability 1/k2

- Will find the large coordinate

- But communication is (k2)

- Send update with probability 1/k2

- Will find the large coordinate

- But communication is (k2)

What Is Happening?

• Sampling with probability ¼ 1/k2 is good to get a few samples from heavy item

• But all the light coordinates are in the way, making the communication (k2)

• Suppose we put a barrier of k, that is, sample with probability ¼ 1/k2 but only send an item if it has occurred at least k times on a site

• Now communication is O(1) and found heavy coordinate

• But light coordinates also contribute to overall |x|2 value

• Sample at different scales with different barriers

• Use public coin to create O(log n) groups T1, …, Tlog n of the n input coordinates

• Tz contains n/2z random coordinates

• Suppose Site i sees the update xi à xi + ej

• For each Tz containing j • If xi

j > (θ/2z)1/2/k then with probability (2z/θ)1/2¢poly(ε-1 log n), send (j, z) to the coordinator

Algorithm for Euclidean Norm

• Expected communication O~(k)

• If a group of coordinates contributes to|x|2, there is a z for which a few coordinates in the group are sampled multiple times

Conclusions• Improved communication lower and upper bounds

for estimating |x|p

• Implies tight lower bounds for estimating entropy, heavy hitters, quantiles

• Implications for data stream model– First lower bound for |x|0 without Gap-Hamming– Useful information cost lower bound for Gap-Hamming, or protocol has very large communication– Improve (n1-2/p/ε2/p) bound for estimating |x|p in a

stream to (n1-2/p/ε4/p)