+ All Categories
Home > Documents > Testing probability distributions using conditional...

Testing probability distributions using conditional...

Date post: 08-Feb-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
59
Testing probability distributions using conditional samples (when testers get to be picky) Cl´ ement Canonne * Dana Ron Rocco Servedio * * Columbia University Tel-Aviv University March 8 th , 2013 Cl´ ement Canonne (Columbia University) Testing distributions with a COND oracle March 8 th , 2013 1 / 35
Transcript
Page 1: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Testing probability distributions using conditionalsamples

(when testers get to be picky)

Clement Canonne∗ Dana Ron† Rocco Servedio∗

∗Columbia University

†Tel-Aviv University

March 8th, 2013

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 1 / 35

Page 2: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Plan of the talk

1 Introduction

2 Testing Uniformity

3 Tools and subroutines

4 Back to uniformity

5 Conclusion

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 2 / 35

Page 3: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Background and motivationWhat is distribution testing?

Property testingGiven a big, hidden “object” X one can only access by local, expensiveinspections (e.g., oracle queries), and a property P, the goal is to check insublinear number of inspections if (a) X has the property or (b) X is “far”from all objects having the property.1

Testing distributions (standard model)X is an unknown probability distribution D over some N-element set; thetesting algorithm has blackbox sample access to D.

1wrt to some specified metric, and parameter ε > 0 given to the tester.Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 3 / 35

Page 4: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Background and motivationWhat is distribution testing?

Property testingGiven a big, hidden “object” X one can only access by local, expensiveinspections (e.g., oracle queries), and a property P, the goal is to check insublinear number of inspections if (a) X has the property or (b) X is “far”from all objects having the property.1

Testing distributions (standard model)X is an unknown probability distribution D over some N-element set; thetesting algorithm has blackbox sample access to D.

1wrt to some specified metric, and parameter ε > 0 given to the tester.Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 3 / 35

Page 5: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Distribution testing (1)In more details.

Distance criterion: total variation distance (∝ L1 distance)

dTV(D1,D2)def=

12‖D1 − D2‖1 =

12∑

i∈[N]

|D1(i)− D2(i)|.

Definition (Testing algorithm)

Let P be a property of distributions over [N], and ORACLED be some typeof oracle which provides access to D. A q(ε,N)-query ORACLE testingalgorithm for P is an algorithm T which, given ε,N as input parametersand oracle access to an ORACLED oracle, and for any distribution D over[N], makes at most q(ε,N) calls to ORACLED, and:

if D ∈ P then, w.p. at least 2/3, T outputs ACCEPT;if dTV(D,P) ≥ ε then, w.p. at least 2/3, T outputs REJECT.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 4 / 35

Page 6: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Distribution testing (2)Comments

A few remarkstester is randomized;

“gray” area for dTV(D,P) ∈ (0, ε);2/3 is completely arbitrary;extends to several oracles and distributions;our measure is the sample complexity (not the running time).

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 5 / 35

Page 7: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Distribution testing (2)Comments

A few remarkstester is randomized;“gray” area for dTV(D,P) ∈ (0, ε);

2/3 is completely arbitrary;extends to several oracles and distributions;our measure is the sample complexity (not the running time).

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 5 / 35

Page 8: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Distribution testing (2)Comments

A few remarkstester is randomized;“gray” area for dTV(D,P) ∈ (0, ε);2/3 is completely arbitrary;

extends to several oracles and distributions;our measure is the sample complexity (not the running time).

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 5 / 35

Page 9: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Distribution testing (2)Comments

A few remarkstester is randomized;“gray” area for dTV(D,P) ∈ (0, ε);2/3 is completely arbitrary;extends to several oracles and distributions;

our measure is the sample complexity (not the running time).

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 5 / 35

Page 10: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Distribution testing (2)Comments

A few remarkstester is randomized;“gray” area for dTV(D,P) ∈ (0, ε);2/3 is completely arbitrary;extends to several oracles and distributions;our measure is the sample complexity (not the running time).

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 5 / 35

Page 11: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Distribution testing (3)Concrete example: testing uniformity

Property P (“being U , the uniform distribution over [N]”) ⇔ set SP ofdistributions with this property (SP = U)Distance to P:

dTV(D,SP) = minD′∈SP

dTV(D,D′

)=

heredTV(D,U)

General outline1 Draw a bunch of samples from D;2 “Process” them, for instance by counting the number of points drawn more

than once (collisions);3 Compare the result to what one would expect from the uniform distribution U ;4 Reject if it differs too much; accept otherwise.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 6 / 35

Page 12: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Background and motivationWell, it’s more or less settled.

FactIn the standard sampling model, most (natural) properties are “hard” totest; that is, require a strong dependence on N (at least Ω(

√N)).

ExampleTesting uniformity has Θ(

√N/ε2) sample complexity

[GR00, BFR+10, Pan08], equivalence to a known distribution Θ(√

N/ε2)[BFF+01, Pan08]; equivalence of two unknown distributions Ω(N2/3)[BFR+10, Val11] (and essentially matching upperbound). . .

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 7 / 35

Page 13: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Background and motivationWell, it’s more or less settled.

FactIn the standard sampling model, most (natural) properties are “hard” totest; that is, require a strong dependence on N (at least Ω(

√N)).

ExampleTesting uniformity has Θ(

√N/ε2) sample complexity

[GR00, BFR+10, Pan08], equivalence to a known distribution Θ(√

N/ε2)[BFF+01, Pan08]; equivalence of two unknown distributions Ω(N2/3)[BFR+10, Val11] (and essentially matching upperbound). . .

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 7 / 35

Page 14: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Our model

More power to the testerIn a lot of natural applications, the tester has more control over the“experiment” it is running – e.g., by tuning the conditions or the settings toinfluence the outcome, effectively restricting its range. This is not capturedby the SAMP model; to mend this, we consider a new model where thetesting algorithm can ask for a specific range of outcomes, and get a drawconditioned on it being in that domain.

Definition (COND oracle)Fix a distribution D over [N]. A COND oracle for D, denoted CONDD, isdefined as follows: The oracle is given as input a query set S ⊆ [N] that hasD(S) > 0, and returns an element i ∈ S, where the probability that elementi is returned is DS(i) = D(i)/D(S), independently of all previous calls tothe oracle.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 8 / 35

Page 15: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Our model

More power to the testerIn a lot of natural applications, the tester has more control over the“experiment” it is running – e.g., by tuning the conditions or the settings toinfluence the outcome, effectively restricting its range. This is not capturedby the SAMP model; to mend this, we consider a new model where thetesting algorithm can ask for a specific range of outcomes, and get a drawconditioned on it being in that domain.

Definition (COND oracle)Fix a distribution D over [N]. A COND oracle for D, denoted CONDD, isdefined as follows: The oracle is given as input a query set S ⊆ [N] that hasD(S) > 0, and returns an element i ∈ S, where the probability that elementi is returned is DS(i) = D(i)/D(S), independently of all previous calls tothe oracle.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 8 / 35

Page 16: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Our model

Remarkgeneralizes the SAMP oracle (S = [N]), but allows adaptiveness;

variants of the (general) COND oracle, which only allow some specifictypes of subsets to be queried: PCOND (either [N] or sets i , j) andICOND (only intervals);not defined for sets S with zero probability under D;similar model independently introduced by Chakraborty etal. [CFGM13].

QuestionDo COND oracles enable more efficient testing algorithms than SAMPoracles? And what does it reveal about testing distributions?

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 9 / 35

Page 17: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Our model

Remarkgeneralizes the SAMP oracle (S = [N]), but allows adaptiveness;variants of the (general) COND oracle, which only allow some specifictypes of subsets to be queried: PCOND (either [N] or sets i , j) andICOND (only intervals);

not defined for sets S with zero probability under D;similar model independently introduced by Chakraborty etal. [CFGM13].

QuestionDo COND oracles enable more efficient testing algorithms than SAMPoracles? And what does it reveal about testing distributions?

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 9 / 35

Page 18: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Our model

Remarkgeneralizes the SAMP oracle (S = [N]), but allows adaptiveness;variants of the (general) COND oracle, which only allow some specifictypes of subsets to be queried: PCOND (either [N] or sets i , j) andICOND (only intervals);not defined for sets S with zero probability under D;

similar model independently introduced by Chakraborty etal. [CFGM13].

QuestionDo COND oracles enable more efficient testing algorithms than SAMPoracles? And what does it reveal about testing distributions?

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 9 / 35

Page 19: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Our model

Remarkgeneralizes the SAMP oracle (S = [N]), but allows adaptiveness;variants of the (general) COND oracle, which only allow some specifictypes of subsets to be queried: PCOND (either [N] or sets i , j) andICOND (only intervals);not defined for sets S with zero probability under D;similar model independently introduced by Chakraborty etal. [CFGM13].

QuestionDo COND oracles enable more efficient testing algorithms than SAMPoracles? And what does it reveal about testing distributions?

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 9 / 35

Page 20: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Our model

Remarkgeneralizes the SAMP oracle (S = [N]), but allows adaptiveness;variants of the (general) COND oracle, which only allow some specifictypes of subsets to be queried: PCOND (either [N] or sets i , j) andICOND (only intervals);not defined for sets S with zero probability under D;similar model independently introduced by Chakraborty etal. [CFGM13].

QuestionDo COND oracles enable more efficient testing algorithms than SAMPoracles? And what does it reveal about testing distributions?

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 9 / 35

Page 21: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Our results

QuestionDo COND oracles enable more efficient testing algorithms than SAMPoracles?

Yes, they do.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 10 / 35

Page 22: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Our results

QuestionDo COND oracles enable more efficient testing algorithms than SAMPoracles? Yes, they do.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 10 / 35

Page 23: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Our resultsComparison of the COND and SAMP models on several testing problems

Problem Our results Standard model

Is D uniform?CONDD Ω

(1

ε2

)PCONDD O

(1

ε2

)ICONDD

O(

log3 Nε3

(√N

ε2

)[GR00, BFR+10, Pan08]

Ω( log N

log log N

)Is D = D∗?

CONDD O(

1ε4

)PCONDD

O(

log4 Nε4

(√N

ε2

)[BFF+01, Pan08]

Ω

(√log N

log log N

)Are D1,D2 equivalent? CONDD1,D2 O

(log5 N

ε4

)O(

N2/3

ε8/3

)[BFR+10]

PCONDD1,D2 O(

log6 Nε21

)Ω(

N2/3)

[BFR+10, Val11]

How far is D from U? PCONDD O(

1ε20

) O(

1ε2

Nlog N

)[VV11, VV10b]

Ω(

Nlog N

)[VV11, VV10a]

Table: The upper bounds for the first 3 problems are for testing the property,while the last one involves estimating the totalvariation distance to uniformity to within an additive ±ε.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 11 / 35

Page 24: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Rest of the talk

Plan for rest of talk:testing uniformity: an upper bound (with pairwise queries)testing uniformity: a lower boundintroducing tools: Estimate-Neighborhood and Approx-Evaltesting uniformity, again: a (glimpse at) interval queries.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 12 / 35

Page 25: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Testing Uniformity (1)Why bother with N?

Theorem (Testing Uniformity with PCOND)

There exists a O(1/ε2)-query PCONDD tester for uniformity, i.e. it acceptsw.p. at least 2/3 if D = U and rejects w.p. at least 2/3 if dTV(D,U) ≥ ε.

High-level ideaIntuitively, if D is ε-far from uniform, it must have (a) a lot of points “verylight”; and (b) a lot of weight on points “very heavy”. Sampling O(1/ε)points both uniformly and according to D, we obtain whp both light andheavy ones; and use PCOND to compare them.Not good enough (O(1/ε4) queries) refine this approach to get O(1/ε2).

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 13 / 35

Page 26: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Testing Uniformity (1)Why bother with N?

Theorem (Testing Uniformity with PCOND)

There exists a O(1/ε2)-query PCONDD tester for uniformity, i.e. it acceptsw.p. at least 2/3 if D = U and rejects w.p. at least 2/3 if dTV(D,U) ≥ ε.

High-level ideaIntuitively, if D is ε-far from uniform, it must have (a) a lot of points “verylight”; and (b) a lot of weight on points “very heavy”. Sampling O(1/ε)points both uniformly and according to D, we obtain whp both light andheavy ones; and use PCOND to compare them.

Not good enough (O(1/ε4) queries) refine this approach to get O(1/ε2).

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 13 / 35

Page 27: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Testing Uniformity (1)Why bother with N?

Theorem (Testing Uniformity with PCOND)

There exists a O(1/ε2)-query PCONDD tester for uniformity, i.e. it acceptsw.p. at least 2/3 if D = U and rejects w.p. at least 2/3 if dTV(D,U) ≥ ε.

High-level ideaIntuitively, if D is ε-far from uniform, it must have (a) a lot of points “verylight”; and (b) a lot of weight on points “very heavy”. Sampling O(1/ε)points both uniformly and according to D, we obtain whp both light andheavy ones; and use PCOND to compare them.Not good enough (O(1/ε4) queries) refine this approach to get O(1/ε2).

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 13 / 35

Page 28: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Testing Uniformity (2)Getting our hands dirty.

Algorithm 1: PCONDD-Test-Uniform1: Set t = log( 4

ε ) + 1.2: Select q = Θ(1) points i1, . . . , iq uniformly Reference points3: for j = 1 to t do4: Call the SAMPD oracle sj = Θ

(2jt)

times to obtain points h1, . . . , hsj

distributed according to D Try to get a heavy point5: Draw sj points `1, . . . , `sj uniformly from [N] Try to get a light point6: for all pairs (x , y) = (ir , hr ′ ) and (x , y) = (ir , `r ′ ) do7: Call CompareD(x, y,Θ(ε2j), 2, exp−Θ(t)).8: if it does not return a value in [1− 2j−5 ε

4 , 1 + 2j−5 ε4 ] then

9: output REJECT (and exit).10: end if11: end for12: end for13: Output ACCEPT

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 14 / 35

Page 29: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Testing Uniformity (3)

Proof (Outline).Sample complexity by the setting of t, q and the calls to CompareCompleteness unless Compare fails to output a correct value, no rejection

Soundness Suppose D is ε-far from U ; refinement of the previousapproach by bucketing low and high points:

Hjdef=

h∣∣∣∣ (1 + 2j−1 ε

4

) 1N ≤ D(h) <

(1 + 2j ε

4

) 1N

Ljdef=

`

∣∣∣∣ (1− 2j ε

4

) 1N < D(`) ≤

(1− 2j−1 ε

4

) 1N

for j ∈ [t − 1], with also H0, L0,Ht , Lt to cover everything;each loop iteration on l.3 “focuses” on a particular bucket.

+ Chernoff and union bounds.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 15 / 35

Page 30: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Testing Uniformity – Lower Bound (1)

Theorem (Testing Uniformity with COND)

Any CONDD algorithm for testing whether D = U versus dTV(D,U) ≥ εmust make Ω(1/ε2) queries.

RemarkAs PCOND is a restriction of COND, the previous upper bound wasessentially optimal.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 16 / 35

Page 31: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Testing Uniformity – Lower Bound (1)

Theorem (Testing Uniformity with COND)

Any CONDD algorithm for testing whether D = U versus dTV(D,U) ≥ εmust make Ω(1/ε2) queries.

RemarkAs PCOND is a restriction of COND, the previous upper bound wasessentially optimal.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 16 / 35

Page 32: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Testing Uniformity – Lower Bound (2)

High-level idea.Reduce it to the problem of distinguishing between a fair and a biased coin,by defining a “no-instance” Dno s.t.

1 Dno is ε-far from U ;2 any q-query tester A which distinguishes Dno from U can be turned

into a tester A′ distinguishing between (1) a sequence of q fair cointosses and (2) a sequence of q (4ε)-biased coin tosses.

However, it is known that distinguishing between these two scenariosrequires Ω

(1/ε2) coin tosses.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 17 / 35

Page 33: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Dno(i)

i

1+2εN

1−2εN

N2

N1

Figure: The no-instance Dno.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 18 / 35

Page 34: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Testing Uniformity – Lower Bound (3)The reduction: how to simulate CONDD from coin tosses

To run A from A′, we must simulate CONDD (D either U or Dno) toprovide the former with samples, given the corresponding coin tosses.

At step 1 ≤ t ≤ q, A chooses to query S ⊂ [N] (according to the (t − 1)previous answers it got from the simulation). A′ behaves as follows:

sets S0def= S ∩ [1, N

2 ], S1def= S ∩ [N

2 + 1,N];

gets bit bt , and draws σ ∼

Bern(ut) if bt = 1Bern(vt) o.w.

(†)

draws s u.a.r. from Sσ;gives (S, s) to A.

(†) for a right choice of ut , vt depending on |S0|, |S1|, ε

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 19 / 35

Page 35: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Testing Uniformity – Summary

Θ(√

N/ε2)

with SAMP: counting collisions [GR00, BFR+10, Pan08]

O(1/ε2) with PCOND: comparing random pairs of points

Ω(1/ε2) with COND: reducing to fair vs. biased coin

RemarkTesting with ICOND will require a logarithmic dependence on N.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 20 / 35

Page 36: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Testing Uniformity – Summary

Θ(√

N/ε2)

with SAMP: counting collisions [GR00, BFR+10, Pan08]

O(1/ε2) with PCOND: comparing random pairs of points

Ω(1/ε2) with COND: reducing to fair vs. biased coin

RemarkTesting with ICOND will require a logarithmic dependence on N.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 20 / 35

Page 37: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Building tools (1)

CompareLow-level procedure: compares the relative weight of sets X , Y , givensome accuracy parameter η.Estimate-NeighborhoodOn input a point i ∈ [N] and parameter γ, estimates the weight underD of the γ-neighborhood of i – that is, points with probability masswithin a factor (1 + γ) of D(i).Approx-EvalGiven i ∈ [N] and accuracy parameter η, returns an approximation ofD(i) – succeeds whp for most points i .

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 21 / 35

Page 38: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Building tools (2)“Comparison is the death of joy.” – Mark Twain.

The low-level tool Compare

Given as input two disjoint subsets X ,Y , parameters η ∈ (0, 1], K ≥ 1, andδ ∈ (0, 1/2], and COND access to D, the procedure Compare eitheroutputs a value ρ > 0, High or Low, s.t:

If D(X )/K ≤ D(Y ) ≤ K · D(X ) then w.p. 1− δ it outputs a valueρ ∈ [1− η, 1 + η]D(Y )/D(X );

If D(Y ) > K · D(X ) then w.p. 1− δ it outputs either High or a valueρ ∈ [1− η, 1 + η]D(Y )/D(X );

If D(Y ) < D(X )/K then w.p. 1− δ it outputs either Low or a valueρ ∈ [1− η, 1 + η]D(Y )/D(X ).

Compare performs O(

K log(1/δ)η2

)COND queries on X ∪ Y .

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 22 / 35

Page 39: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Building tools (3)

X

Y

Low

ρD(X

)D

(Y)

ρD(X) ≈ D(Y )

ρ

High

D(X)

D(Y)

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 23 / 35

Page 40: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Building tools (4)

Definition (γ-Neighborhood)

Uγ(x)def=

y ∈ [N] :1

1 + γD(x) ≤ D(y) ≤ (1 + γ)D(x)

, γ ∈ [0, 1]

GoalGiven a point x ∈ [N] and a parameter γ, get an approximation ofD(Uγ(x)) – i.e., “how much weight does D put on points like x?”

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 24 / 35

Page 41: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Building tools (4)

Definition (γ-Neighborhood)

Uγ(x)def=

y ∈ [N] :1

1 + γD(x) ≤ D(y) ≤ (1 + γ)D(x)

, γ ∈ [0, 1]

GoalGiven a point x ∈ [N] and a parameter γ, get an approximation ofD(Uγ(x)) – i.e., “how much weight does D put on points like x?”

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 24 / 35

Page 42: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Building tools (5)

The (slightly) higher-level subroutine Estimate-Neighborhood

Given as input a point x , parameters γ, β, η, δ ∈ (0, 1/2] and PCONDDaccess, the procedure Estimate-Neighborhood outputs a pair(w , α) ∈ [0, 1]× (γ, 2γ) such that, for θ small:

1 If D(Uα(x)) ≥ β, then w.p. 1− δ we havew ∈ [1− η, 1 + η] · D(Uα(x)), and D(Uα+θ(x) \ Uα(x)) ≤ ηβ/16;

2 If D(Uα(x)) < β, then w.p. 1− δ we have w ≤ (1 + η) · β, andD(Uα+θ(x) \ Uα(x)) ≤ ηβ/16.

Estimate-Neighborhood performs O(

log(1/δ)γ2η4β3δ2

)queries.

RemarkDoes not estimate exactly D(Uγ(x)).

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 25 / 35

Page 43: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Building tools (5)

The (slightly) higher-level subroutine Estimate-Neighborhood

Given as input a point x , parameters γ, β, η, δ ∈ (0, 1/2] and PCONDDaccess, the procedure Estimate-Neighborhood outputs a pair(w , α) ∈ [0, 1]× (γ, 2γ) such that, for θ small:

1 If D(Uα(x)) ≥ β, then w.p. 1− δ we havew ∈ [1− η, 1 + η] · D(Uα(x)), and D(Uα+θ(x) \ Uα(x)) ≤ ηβ/16;

2 If D(Uα(x)) < β, then w.p. 1− δ we have w ≤ (1 + η) · β, andD(Uα+θ(x) \ Uα(x)) ≤ ηβ/16.

Estimate-Neighborhood performs O(

log(1/δ)γ2η4β3δ2

)queries.

RemarkDoes not estimate exactly D(Uγ(x)).

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 25 / 35

Page 44: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

U2γ

Uα+θ

' no weight

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 26 / 35

Page 45: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Building tools (6)

EVAL oracle

A δ-EVALD simulator for D is a randomized procedure ORACLE such thatw.p. 1− δ the output of ORACLE on input i∗ ∈ [N] is D(i∗).

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 27 / 35

Page 46: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Building tools (6)

(Approximate) EVAL oracle

An (ε, δ)-approximate EVALD simulator for D is a randomized procedureORACLE such that w.p. 1− δ the output of ORACLE on input i∗ ∈ [N] isa value α ∈ [0, 1] such that α ∈ [1− ε, 1 + ε]D(i∗).

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 27 / 35

Page 47: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Building tools (6)

(Approximate) EVAL oracle

An (ε, δ)-approximate EVALD simulator for D is a randomized procedureORACLE s.t for each ε, there is a fixed set S(ε) ( [N] with D(S(ε)) < ε forwhich the following holds. For all i∗ ∈ [N], ORACLE(i∗) is either a valueα ∈ [0, 1] or Unknown, and furthermore:

(i) If i∗ /∈ S(ε) then w.p. 1− δ the output of ORACLE on input i∗ is avalue α ∈ [0, 1] such that α ∈ [1− ε, 1 + ε]D(i∗);

(i) If i∗ ∈ S(ε) then w.p. 1− δ the procedure either outputs Unknown oroutputs a value α ∈ [0, 1] such that α ∈ [1− ε, 1 + ε]D(i∗).

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 27 / 35

Page 48: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Building tools (6)

(Approximate) EVAL oracle

An (ε, δ)-approximate EVALD simulator for D is a randomized procedureORACLE s.t for each ε, there is a fixed set S(ε) ( [N] with D(S(ε)) < ε forwhich the following holds. For all i∗ ∈ [N], ORACLE(i∗) is either a valueα ∈ [0, 1] or Unknown, and furthermore:

(i) If i∗ /∈ S(ε) then w.p. 1− δ the output of ORACLE on input i∗ is avalue α ∈ [0, 1] such that α ∈ [1− ε, 1 + ε]D(i∗);

(i) If i∗ ∈ S(ε) then w.p. 1− δ the procedure either outputs Unknown oroutputs a value α ∈ [0, 1] such that α ∈ [1− ε, 1 + ε]D(i∗).

The high-level blackbox Approx-Eval

There is an algorithm Approx-Eval which uses O(

(log N)5·(log(1/δ))2

ε3

)calls

to CONDD, and is an (ε, δ)-approximate EVALD simulator.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 27 / 35

Page 49: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

CONDD

Approx-Evalε

“Unknown”or D(i)

i ∗∈ S (ε)

D(i)

i∗ /∈S(ε)

i∗ ∈ [N]

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 28 / 35

Page 50: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Applications

Testing equivalence of two unknown distributions D1, D2

Blackbox access to D1 and D2 (two oracles); distinguish D1 = D2 vs.dTV(D1,D2) ≥ ε.

In the language of property testing: SP = (D, D) | D distribution , with metric overpairs of distributions d((D, D′), (P, P ′)) def

= dTV(D, P) + dTV(D′, P ′).

Two different approaches:1 with PCOND and Estimate-Neighborhood – finding

“representatives” points for both distributions;2 with COND and Approx-Eval – adapting an EVAL algorithm from

[RS09].

Other uses: estimating distance to uniformity(Estimate-Neighborhood), testing monotonicity3 (Approx-Eval). . .

3(extension of the original results)Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 29 / 35

Page 51: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Applications

Testing equivalence of two unknown distributions D1, D2

Blackbox access to D1 and D2 (two oracles); distinguish D1 = D2 vs.dTV(D1,D2) ≥ ε.In the language of property testing: SP = (D, D) | D distribution , with metric overpairs of distributions d((D, D′), (P, P ′)) def

= dTV(D, P) + dTV(D′, P ′).

Two different approaches:1 with PCOND and Estimate-Neighborhood – finding

“representatives” points for both distributions;2 with COND and Approx-Eval – adapting an EVAL algorithm from

[RS09].

Other uses: estimating distance to uniformity(Estimate-Neighborhood), testing monotonicity3 (Approx-Eval). . .

3(extension of the original results)Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 29 / 35

Page 52: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Applications

Testing equivalence of two unknown distributions D1, D2

Blackbox access to D1 and D2 (two oracles); distinguish D1 = D2 vs.dTV(D1,D2) ≥ ε.In the language of property testing: SP = (D, D) | D distribution , with metric overpairs of distributions d((D, D′), (P, P ′)) def

= dTV(D, P) + dTV(D′, P ′).

Two different approaches:1 with PCOND and Estimate-Neighborhood – finding

“representatives” points for both distributions;2 with COND and Approx-Eval – adapting an EVAL algorithm from

[RS09].

Other uses: estimating distance to uniformity(Estimate-Neighborhood), testing monotonicity3 (Approx-Eval). . .

3(extension of the original results)Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 29 / 35

Page 53: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Testing Uniformity with ICOND

Main messageICOND algorithms are weaker than PCOND ones for this: whilepoly(log N, 1/ε) queries are enough, Ω(log N) are necessary.

OverviewUpper bound sort of binary descent on random points (custom-tailored

version of Approx-Eval), to spot deviations from 1/N;Lower bound family of “no-instances” + LB against non-adaptive + hybrid

argument to get LB against adaptive.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 30 / 35

Page 54: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Testing Uniformity with ICOND

Main messageICOND algorithms are weaker than PCOND ones for this: whilepoly(log N, 1/ε) queries are enough, Ω(log N) are necessary.

OverviewUpper bound sort of binary descent on random points (custom-tailored

version of Approx-Eval), to spot deviations from 1/N;Lower bound family of “no-instances” + LB against non-adaptive + hybrid

argument to get LB against adaptive.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 30 / 35

Page 55: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Upper Bound

[N]

[1, N

2

][

1, N4

] [N4 + 1, N

2

][i − 2, i − 1] [i , i + 1]

i i + 1

[N2 + 1,N

]

Figure: Idea of the “binary descent” on i : get an estimate of D(i) by multiplyingestimates at each branching, each time rejecting if ratio between weight of twosubintervals is far from 1

2 . Repeat for Θ(1/ε) points drawn from D.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 31 / 35

Page 56: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

Conclusion

new model for studying probability distributionsarises naturally in a number of settingsallows significantly more query-efficient algorithms

generalizing to other structured domains? (e.g., the Booleanhypercube 0, 1n)what about distribution learning in this frameworkmore properties? (entropy, independence, monotonicity†. . . )

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 32 / 35

Page 57: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

The end.

Thank you.

The full version of this work is available online (arXiv:1211.2664).Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 33 / 35

Page 58: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

References I

T. Batu, E. Fischer, L. Fortnow, R. Kumar, R. Rubinfeld, and P. White, Testing randomvariables for independence and identity, Proceedings of FOCS, 2001, pp. 442–451.

T. Batu, L. Fortnow, R. Rubinfeld, W. D. Smith, and P. White, Testing that distributionsare close, Proceedings of FOCS, 2000, pp. 189–197.

, Testing closeness of discrete distributions, Tech. Report abs/1009.5397, 2010,This is a long version of [BFR+00].

S. Chakraborty, E. Fischer, Y. Goldhirsh, and A. Matsliah, On the power of conditionalsamples in distribution testing, Proceedings of ITCS, 2013, To appear.

O. Goldreich and D. Ron, On testing expansion in bounded-degree graphs, Tech. ReportTR00-020, ECCC, 2000.

L. Paninski, A coincidence-based test for uniformity given very sparsely sampled discretedata, IEEE-IT 54 (2008), no. 10, 4750–4755.

R. Rubinfeld and R. A. Servedio, Testing monotone high-dimensional distributions, RSA 34(2009), no. 1, 24–44.

P. Valiant, Testing symmetric properties of distributions, SICOMP 40 (2011), no. 6,1927–1968.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 34 / 35

Page 59: Testing probability distributions using conditional samplesccanonne/files/talks/cond-2013-03-08.pdf · Testing probability distributions using conditional samples (when testers get

References II

G. Valiant and P. Valiant, A CLT and tight lower bounds for estimating entropy, Tech.Report TR10-179, ECCC, 2010.

, Estimating the unseen: A sublinear-sample canonical estimator of distributions,Tech. Report TR10-180, ECCC, 2010.

, Estimating the unseen: an n/ log(n)-sample estimator for entropy and supportsize, shown optimal via new CLTs, Proceedings of STOC, 2011, See also [VV10a] and[VV10b], pp. 685–694.

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 35 / 35


Recommended