Testing probability distributions using conditional...

Testing probability distributions using conditionalsamples

(when testers get to be picky)

Clement Canonne∗ Dana Ron† Rocco Servedio∗

∗Columbia University

†Tel-Aviv University

March 8th, 2013

Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 1 / 35

Plan of the talk

1 Introduction

2 Testing Uniformity

3 Tools and subroutines

4 Back to uniformity

5 Conclusion


Background and motivationWhat is distribution testing?

Property testingGiven a big, hidden “object” X one can only access by local, expensiveinspections (e.g., oracle queries), and a property P, the goal is to check insublinear number of inspections if (a) X has the property or (b) X is “far”from all objects having the property.1

Testing distributions (standard model)X is an unknown probability distribution D over some N-element set; thetesting algorithm has blackbox sample access to D.

1wrt to some specified metric, and parameter ε > 0 given to the tester.Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 3 / 35

Background and motivationWhat is distribution testing?

Property testingGiven a big, hidden “object” X one can only access by local, expensiveinspections (e.g., oracle queries), and a property P, the goal is to check insublinear number of inspections if (a) X has the property or (b) X is “far”from all objects having the property.1

Testing distributions (standard model)X is an unknown probability distribution D over some N-element set; thetesting algorithm has blackbox sample access to D.

1wrt to some specified metric, and parameter ε > 0 given to the tester.Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 3 / 35

Distribution testing (1)In more details.

Distance criterion: total variation distance (∝ L1 distance)

dTV(D1,D2)def=

12‖D1 − D2‖1 =

12∑

i∈[N]

|D1(i)− D2(i)|.

Definition (Testing algorithm)

Let P be a property of distributions over [N], and ORACLED be some typeof oracle which provides access to D. A q(ε,N)-query ORACLE testingalgorithm for P is an algorithm T which, given ε,N as input parametersand oracle access to an ORACLED oracle, and for any distribution D over[N], makes at most q(ε,N) calls to ORACLED, and:

if D ∈ P then, w.p. at least 2/3, T outputs ACCEPT;if dTV(D,P) ≥ ε then, w.p. at least 2/3, T outputs REJECT.


Distribution testing (2)Comments

A few remarkstester is randomized;

“gray” area for dTV(D,P) ∈ (0, ε);2/3 is completely arbitrary;extends to several oracles and distributions;our measure is the sample complexity (not the running time).



A few remarkstester is randomized;“gray” area for dTV(D,P) ∈ (0, ε);

2/3 is completely arbitrary;extends to several oracles and distributions;our measure is the sample complexity (not the running time).



A few remarkstester is randomized;“gray” area for dTV(D,P) ∈ (0, ε);2/3 is completely arbitrary;

extends to several oracles and distributions;our measure is the sample complexity (not the running time).



A few remarkstester is randomized;“gray” area for dTV(D,P) ∈ (0, ε);2/3 is completely arbitrary;extends to several oracles and distributions;

our measure is the sample complexity (not the running time).



A few remarkstester is randomized;“gray” area for dTV(D,P) ∈ (0, ε);2/3 is completely arbitrary;extends to several oracles and distributions;our measure is the sample complexity (not the running time).


Distribution testing (3)Concrete example: testing uniformity

Property P (“being U , the uniform distribution over [N]”) ⇔ set SP ofdistributions with this property (SP = U)Distance to P:

dTV(D,SP) = minD′∈SP

dTV(D,D′

)=

heredTV(D,U)

General outline1 Draw a bunch of samples from D;2 “Process” them, for instance by counting the number of points drawn more

than once (collisions);3 Compare the result to what one would expect from the uniform distribution U ;4 Reject if it differs too much; accept otherwise.


Background and motivationWell, it’s more or less settled.

FactIn the standard sampling model, most (natural) properties are “hard” totest; that is, require a strong dependence on N (at least Ω(

√N)).

ExampleTesting uniformity has Θ(

√N/ε2) sample complexity

[GR00, BFR+10, Pan08], equivalence to a known distribution Θ(√

N/ε2)[BFF+01, Pan08]; equivalence of two unknown distributions Ω(N2/3)[BFR+10, Val11] (and essentially matching upperbound). . .


Background and motivationWell, it’s more or less settled.

FactIn the standard sampling model, most (natural) properties are “hard” totest; that is, require a strong dependence on N (at least Ω(

√N)).

ExampleTesting uniformity has Θ(

√N/ε2) sample complexity

[GR00, BFR+10, Pan08], equivalence to a known distribution Θ(√

N/ε2)[BFF+01, Pan08]; equivalence of two unknown distributions Ω(N2/3)[BFR+10, Val11] (and essentially matching upperbound). . .


Our model

More power to the testerIn a lot of natural applications, the tester has more control over the“experiment” it is running – e.g., by tuning the conditions or the settings toinfluence the outcome, effectively restricting its range. This is not capturedby the SAMP model; to mend this, we consider a new model where thetesting algorithm can ask for a specific range of outcomes, and get a drawconditioned on it being in that domain.

Definition (COND oracle)Fix a distribution D over [N]. A COND oracle for D, denoted CONDD, isdefined as follows: The oracle is given as input a query set S ⊆ [N] that hasD(S) > 0, and returns an element i ∈ S, where the probability that elementi is returned is DS(i) = D(i)/D(S), independently of all previous calls tothe oracle.


Our model

More power to the testerIn a lot of natural applications, the tester has more control over the“experiment” it is running – e.g., by tuning the conditions or the settings toinfluence the outcome, effectively restricting its range. This is not capturedby the SAMP model; to mend this, we consider a new model where thetesting algorithm can ask for a specific range of outcomes, and get a drawconditioned on it being in that domain.

Definition (COND oracle)Fix a distribution D over [N]. A COND oracle for D, denoted CONDD, isdefined as follows: The oracle is given as input a query set S ⊆ [N] that hasD(S) > 0, and returns an element i ∈ S, where the probability that elementi is returned is DS(i) = D(i)/D(S), independently of all previous calls tothe oracle.


Our model

Remarkgeneralizes the SAMP oracle (S = [N]), but allows adaptiveness;

variants of the (general) COND oracle, which only allow some specifictypes of subsets to be queried: PCOND (either [N] or sets i , j) andICOND (only intervals);not defined for sets S with zero probability under D;similar model independently introduced by Chakraborty etal. [CFGM13].

QuestionDo COND oracles enable more efficient testing algorithms than SAMPoracles? And what does it reveal about testing distributions?


Our model

Remarkgeneralizes the SAMP oracle (S = [N]), but allows adaptiveness;variants of the (general) COND oracle, which only allow some specifictypes of subsets to be queried: PCOND (either [N] or sets i , j) andICOND (only intervals);

not defined for sets S with zero probability under D;similar model independently introduced by Chakraborty etal. [CFGM13].



Our model

Remarkgeneralizes the SAMP oracle (S = [N]), but allows adaptiveness;variants of the (general) COND oracle, which only allow some specifictypes of subsets to be queried: PCOND (either [N] or sets i , j) andICOND (only intervals);not defined for sets S with zero probability under D;

similar model independently introduced by Chakraborty etal. [CFGM13].



Our model

Remarkgeneralizes the SAMP oracle (S = [N]), but allows adaptiveness;variants of the (general) COND oracle, which only allow some specifictypes of subsets to be queried: PCOND (either [N] or sets i , j) andICOND (only intervals);not defined for sets S with zero probability under D;similar model independently introduced by Chakraborty etal. [CFGM13].



Our model

Remarkgeneralizes the SAMP oracle (S = [N]), but allows adaptiveness;variants of the (general) COND oracle, which only allow some specifictypes of subsets to be queried: PCOND (either [N] or sets i , j) andICOND (only intervals);not defined for sets S with zero probability under D;similar model independently introduced by Chakraborty etal. [CFGM13].



Our results

QuestionDo COND oracles enable more efficient testing algorithms than SAMPoracles?

Yes, they do.


Our results

QuestionDo COND oracles enable more efficient testing algorithms than SAMPoracles? Yes, they do.


Our resultsComparison of the COND and SAMP models on several testing problems

Problem Our results Standard model

Is D uniform?CONDD Ω

(1

ε2

)PCONDD O

(1

ε2

)ICONDD

O(

log3 Nε3

)Θ

(√N

ε2

)[GR00, BFR+10, Pan08]

Ω( log N

log log N

)Is D = D∗?

CONDD O(

1ε4

)PCONDD

O(

log4 Nε4

)Θ

(√N

ε2

)[BFF+01, Pan08]

Ω

(√log N

log log N

)Are D1,D2 equivalent? CONDD1,D2 O

(log5 N

ε4

)O(

N2/3

ε8/3

)[BFR+10]

PCONDD1,D2 O(

log6 Nε21

)Ω(

N2/3)

[BFR+10, Val11]

How far is D from U? PCONDD O(

1ε20

) O(

1ε2

Nlog N

)[VV11, VV10b]

Ω(

Nlog N

)[VV11, VV10a]

Table: The upper bounds for the first 3 problems are for testing the property,while the last one involves estimating the totalvariation distance to uniformity to within an additive ±ε.


Rest of the talk

Plan for rest of talk:testing uniformity: an upper bound (with pairwise queries)testing uniformity: a lower boundintroducing tools: Estimate-Neighborhood and Approx-Evaltesting uniformity, again: a (glimpse at) interval queries.


Testing Uniformity (1)Why bother with N?

Theorem (Testing Uniformity with PCOND)

There exists a O(1/ε2)-query PCONDD tester for uniformity, i.e. it acceptsw.p. at least 2/3 if D = U and rejects w.p. at least 2/3 if dTV(D,U) ≥ ε.

High-level ideaIntuitively, if D is ε-far from uniform, it must have (a) a lot of points “verylight”; and (b) a lot of weight on points “very heavy”. Sampling O(1/ε)points both uniformly and according to D, we obtain whp both light andheavy ones; and use PCOND to compare them.Not good enough (O(1/ε4) queries) refine this approach to get O(1/ε2).





High-level ideaIntuitively, if D is ε-far from uniform, it must have (a) a lot of points “verylight”; and (b) a lot of weight on points “very heavy”. Sampling O(1/ε)points both uniformly and according to D, we obtain whp both light andheavy ones; and use PCOND to compare them.

Not good enough (O(1/ε4) queries) refine this approach to get O(1/ε2).





High-level ideaIntuitively, if D is ε-far from uniform, it must have (a) a lot of points “verylight”; and (b) a lot of weight on points “very heavy”. Sampling O(1/ε)points both uniformly and according to D, we obtain whp both light andheavy ones; and use PCOND to compare them.Not good enough (O(1/ε4) queries) refine this approach to get O(1/ε2).


Testing Uniformity (2)Getting our hands dirty.

Algorithm 1: PCONDD-Test-Uniform1: Set t = log( 4

ε ) + 1.2: Select q = Θ(1) points i1, . . . , iq uniformly Reference points3: for j = 1 to t do4: Call the SAMPD oracle sj = Θ

(2jt)

times to obtain points h1, . . . , hsj

distributed according to D Try to get a heavy point5: Draw sj points `1, . . . , `sj uniformly from [N] Try to get a light point6: for all pairs (x , y) = (ir , hr ′ ) and (x , y) = (ir , `r ′ ) do7: Call CompareD(x, y,Θ(ε2j), 2, exp−Θ(t)).8: if it does not return a value in [1− 2j−5 ε

4 , 1 + 2j−5 ε4 ] then

9: output REJECT (and exit).10: end if11: end for12: end for13: Output ACCEPT


Testing Uniformity (3)

Proof (Outline).Sample complexity by the setting of t, q and the calls to CompareCompleteness unless Compare fails to output a correct value, no rejection

Soundness Suppose D is ε-far from U ; refinement of the previousapproach by bucketing low and high points:

Hjdef=

h∣∣∣∣ (1 + 2j−1 ε

4

) 1N ≤ D(h) <

(1 + 2j ε

4

) 1N

Ljdef=

`

∣∣∣∣ (1− 2j ε

4

) 1N < D(`) ≤

(1− 2j−1 ε

4

) 1N

for j ∈ [t − 1], with also H0, L0,Ht , Lt to cover everything;each loop iteration on l.3 “focuses” on a particular bucket.

+ Chernoff and union bounds.


Testing Uniformity – Lower Bound (1)

Theorem (Testing Uniformity with COND)

Any CONDD algorithm for testing whether D = U versus dTV(D,U) ≥ εmust make Ω(1/ε2) queries.

RemarkAs PCOND is a restriction of COND, the previous upper bound wasessentially optimal.



Theorem (Testing Uniformity with COND)

Any CONDD algorithm for testing whether D = U versus dTV(D,U) ≥ εmust make Ω(1/ε2) queries.

RemarkAs PCOND is a restriction of COND, the previous upper bound wasessentially optimal.



High-level idea.Reduce it to the problem of distinguishing between a fair and a biased coin,by defining a “no-instance” Dno s.t.

1 Dno is ε-far from U ;2 any q-query tester A which distinguishes Dno from U can be turned

into a tester A′ distinguishing between (1) a sequence of q fair cointosses and (2) a sequence of q (4ε)-biased coin tosses.

However, it is known that distinguishing between these two scenariosrequires Ω

(1/ε2) coin tosses.


Dno(i)

i

1+2εN

1−2εN

N2

N1

Figure: The no-instance Dno.


Testing Uniformity – Lower Bound (3)The reduction: how to simulate CONDD from coin tosses

To run A from A′, we must simulate CONDD (D either U or Dno) toprovide the former with samples, given the corresponding coin tosses.

At step 1 ≤ t ≤ q, A chooses to query S ⊂ [N] (according to the (t − 1)previous answers it got from the simulation). A′ behaves as follows:

sets S0def= S ∩ [1, N

2 ], S1def= S ∩ [N

2 + 1,N];

gets bit bt , and draws σ ∼

Bern(ut) if bt = 1Bern(vt) o.w.

(†)

draws s u.a.r. from Sσ;gives (S, s) to A.

(†) for a right choice of ut , vt depending on |S0|, |S1|, ε


Testing Uniformity – Summary

Θ(√

N/ε2)

with SAMP: counting collisions [GR00, BFR+10, Pan08]

O(1/ε2) with PCOND: comparing random pairs of points

Ω(1/ε2) with COND: reducing to fair vs. biased coin

RemarkTesting with ICOND will require a logarithmic dependence on N.


Testing Uniformity – Summary

Θ(√

N/ε2)

with SAMP: counting collisions [GR00, BFR+10, Pan08]

O(1/ε2) with PCOND: comparing random pairs of points

Ω(1/ε2) with COND: reducing to fair vs. biased coin

RemarkTesting with ICOND will require a logarithmic dependence on N.


Building tools (1)

CompareLow-level procedure: compares the relative weight of sets X , Y , givensome accuracy parameter η.Estimate-NeighborhoodOn input a point i ∈ [N] and parameter γ, estimates the weight underD of the γ-neighborhood of i – that is, points with probability masswithin a factor (1 + γ) of D(i).Approx-EvalGiven i ∈ [N] and accuracy parameter η, returns an approximation ofD(i) – succeeds whp for most points i .


Building tools (2)“Comparison is the death of joy.” – Mark Twain.

The low-level tool Compare

Given as input two disjoint subsets X ,Y , parameters η ∈ (0, 1], K ≥ 1, andδ ∈ (0, 1/2], and COND access to D, the procedure Compare eitheroutputs a value ρ > 0, High or Low, s.t:

If D(X )/K ≤ D(Y ) ≤ K · D(X ) then w.p. 1− δ it outputs a valueρ ∈ [1− η, 1 + η]D(Y )/D(X );

If D(Y ) > K · D(X ) then w.p. 1− δ it outputs either High or a valueρ ∈ [1− η, 1 + η]D(Y )/D(X );

If D(Y ) < D(X )/K then w.p. 1− δ it outputs either Low or a valueρ ∈ [1− η, 1 + η]D(Y )/D(X ).

Compare performs O(

K log(1/δ)η2

)COND queries on X ∪ Y .


Building tools (3)

X

Y

Low

ρD(X

)D

(Y)

ρD(X) ≈ D(Y )

ρ

High

D(X)

D(Y)


Building tools (4)

Definition (γ-Neighborhood)

Uγ(x)def=

y ∈ [N] :1

1 + γD(x) ≤ D(y) ≤ (1 + γ)D(x)

, γ ∈ [0, 1]

GoalGiven a point x ∈ [N] and a parameter γ, get an approximation ofD(Uγ(x)) – i.e., “how much weight does D put on points like x?”


Building tools (4)

Definition (γ-Neighborhood)

Uγ(x)def=

y ∈ [N] :1

1 + γD(x) ≤ D(y) ≤ (1 + γ)D(x)

, γ ∈ [0, 1]

GoalGiven a point x ∈ [N] and a parameter γ, get an approximation ofD(Uγ(x)) – i.e., “how much weight does D put on points like x?”


Building tools (5)

The (slightly) higher-level subroutine Estimate-Neighborhood

Given as input a point x , parameters γ, β, η, δ ∈ (0, 1/2] and PCONDDaccess, the procedure Estimate-Neighborhood outputs a pair(w , α) ∈ [0, 1]× (γ, 2γ) such that, for θ small:

1 If D(Uα(x)) ≥ β, then w.p. 1− δ we havew ∈ [1− η, 1 + η] · D(Uα(x)), and D(Uα+θ(x) \ Uα(x)) ≤ ηβ/16;

2 If D(Uα(x)) < β, then w.p. 1− δ we have w ≤ (1 + η) · β, andD(Uα+θ(x) \ Uα(x)) ≤ ηβ/16.

Estimate-Neighborhood performs O(

log(1/δ)γ2η4β3δ2

)queries.

RemarkDoes not estimate exactly D(Uγ(x)).


Building tools (5)

The (slightly) higher-level subroutine Estimate-Neighborhood

Given as input a point x , parameters γ, β, η, δ ∈ (0, 1/2] and PCONDDaccess, the procedure Estimate-Neighborhood outputs a pair(w , α) ∈ [0, 1]× (γ, 2γ) such that, for θ small:

1 If D(Uα(x)) ≥ β, then w.p. 1− δ we havew ∈ [1− η, 1 + η] · D(Uα(x)), and D(Uα+θ(x) \ Uα(x)) ≤ ηβ/16;

2 If D(Uα(x)) < β, then w.p. 1− δ we have w ≤ (1 + η) · β, andD(Uα+θ(x) \ Uα(x)) ≤ ηβ/16.

Estimate-Neighborhood performs O(

log(1/δ)γ2η4β3δ2

)queries.

RemarkDoes not estimate exactly D(Uγ(x)).


Uγ

U2γ

Uα

Uα+θ

' no weight


Building tools (6)

EVAL oracle

A δ-EVALD simulator for D is a randomized procedure ORACLE such thatw.p. 1− δ the output of ORACLE on input i∗ ∈ [N] is D(i∗).


Building tools (6)

(Approximate) EVAL oracle

An (ε, δ)-approximate EVALD simulator for D is a randomized procedureORACLE such that w.p. 1− δ the output of ORACLE on input i∗ ∈ [N] isa value α ∈ [0, 1] such that α ∈ [1− ε, 1 + ε]D(i∗).


Building tools (6)


An (ε, δ)-approximate EVALD simulator for D is a randomized procedureORACLE s.t for each ε, there is a fixed set S(ε) ( [N] with D(S(ε)) < ε forwhich the following holds. For all i∗ ∈ [N], ORACLE(i∗) is either a valueα ∈ [0, 1] or Unknown, and furthermore:

(i) If i∗ /∈ S(ε) then w.p. 1− δ the output of ORACLE on input i∗ is avalue α ∈ [0, 1] such that α ∈ [1− ε, 1 + ε]D(i∗);

(i) If i∗ ∈ S(ε) then w.p. 1− δ the procedure either outputs Unknown oroutputs a value α ∈ [0, 1] such that α ∈ [1− ε, 1 + ε]D(i∗).


Building tools (6)


An (ε, δ)-approximate EVALD simulator for D is a randomized procedureORACLE s.t for each ε, there is a fixed set S(ε) ( [N] with D(S(ε)) < ε forwhich the following holds. For all i∗ ∈ [N], ORACLE(i∗) is either a valueα ∈ [0, 1] or Unknown, and furthermore:

(i) If i∗ /∈ S(ε) then w.p. 1− δ the output of ORACLE on input i∗ is avalue α ∈ [0, 1] such that α ∈ [1− ε, 1 + ε]D(i∗);

(i) If i∗ ∈ S(ε) then w.p. 1− δ the procedure either outputs Unknown oroutputs a value α ∈ [0, 1] such that α ∈ [1− ε, 1 + ε]D(i∗).

The high-level blackbox Approx-Eval

There is an algorithm Approx-Eval which uses O(

(log N)5·(log(1/δ))2

ε3

)calls

to CONDD, and is an (ε, δ)-approximate EVALD simulator.


CONDD

Approx-Evalε

“Unknown”or D(i)

i ∗∈ S (ε)

D(i)

i∗ /∈S(ε)

i∗ ∈ [N]


Applications

Testing equivalence of two unknown distributions D1, D2

Blackbox access to D1 and D2 (two oracles); distinguish D1 = D2 vs.dTV(D1,D2) ≥ ε.

In the language of property testing: SP = (D, D) | D distribution , with metric overpairs of distributions d((D, D′), (P, P ′)) def

= dTV(D, P) + dTV(D′, P ′).

Two different approaches:1 with PCOND and Estimate-Neighborhood – finding

“representatives” points for both distributions;2 with COND and Approx-Eval – adapting an EVAL algorithm from

[RS09].

Other uses: estimating distance to uniformity(Estimate-Neighborhood), testing monotonicity3 (Approx-Eval). . .

3(extension of the original results)Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 29 / 35

Applications


Blackbox access to D1 and D2 (two oracles); distinguish D1 = D2 vs.dTV(D1,D2) ≥ ε.In the language of property testing: SP = (D, D) | D distribution , with metric overpairs of distributions d((D, D′), (P, P ′)) def

= dTV(D, P) + dTV(D′, P ′).



[RS09].



Applications


Blackbox access to D1 and D2 (two oracles); distinguish D1 = D2 vs.dTV(D1,D2) ≥ ε.In the language of property testing: SP = (D, D) | D distribution , with metric overpairs of distributions d((D, D′), (P, P ′)) def

= dTV(D, P) + dTV(D′, P ′).



[RS09].



Testing Uniformity with ICOND

Main messageICOND algorithms are weaker than PCOND ones for this: whilepoly(log N, 1/ε) queries are enough, Ω(log N) are necessary.

OverviewUpper bound sort of binary descent on random points (custom-tailored

version of Approx-Eval), to spot deviations from 1/N;Lower bound family of “no-instances” + LB against non-adaptive + hybrid

argument to get LB against adaptive.


Testing Uniformity with ICOND

Main messageICOND algorithms are weaker than PCOND ones for this: whilepoly(log N, 1/ε) queries are enough, Ω(log N) are necessary.

OverviewUpper bound sort of binary descent on random points (custom-tailored

version of Approx-Eval), to spot deviations from 1/N;Lower bound family of “no-instances” + LB against non-adaptive + hybrid

argument to get LB against adaptive.


Upper Bound

[N]

[1, N

2

][

1, N4

] [N4 + 1, N

2

][i − 2, i − 1] [i , i + 1]

i i + 1

[N2 + 1,N

]

Figure: Idea of the “binary descent” on i : get an estimate of D(i) by multiplyingestimates at each branching, each time rejecting if ratio between weight of twosubintervals is far from 1

2 . Repeat for Θ(1/ε) points drawn from D.


Conclusion

new model for studying probability distributionsarises naturally in a number of settingsallows significantly more query-efficient algorithms

generalizing to other structured domains? (e.g., the Booleanhypercube 0, 1n)what about distribution learning in this frameworkmore properties? (entropy, independence, monotonicity†. . . )


The end.

Thank you.

The full version of this work is available online (arXiv:1211.2664).Clement Canonne (Columbia University) Testing distributions with a COND oracle March 8th, 2013 33 / 35

References I

T. Batu, E. Fischer, L. Fortnow, R. Kumar, R. Rubinfeld, and P. White, Testing randomvariables for independence and identity, Proceedings of FOCS, 2001, pp. 442–451.

T. Batu, L. Fortnow, R. Rubinfeld, W. D. Smith, and P. White, Testing that distributionsare close, Proceedings of FOCS, 2000, pp. 189–197.

, Testing closeness of discrete distributions, Tech. Report abs/1009.5397, 2010,This is a long version of [BFR+00].

S. Chakraborty, E. Fischer, Y. Goldhirsh, and A. Matsliah, On the power of conditionalsamples in distribution testing, Proceedings of ITCS, 2013, To appear.

O. Goldreich and D. Ron, On testing expansion in bounded-degree graphs, Tech. ReportTR00-020, ECCC, 2000.

L. Paninski, A coincidence-based test for uniformity given very sparsely sampled discretedata, IEEE-IT 54 (2008), no. 10, 4750–4755.

R. Rubinfeld and R. A. Servedio, Testing monotone high-dimensional distributions, RSA 34(2009), no. 1, 24–44.

P. Valiant, Testing symmetric properties of distributions, SICOMP 40 (2011), no. 6,1927–1968.


References II

G. Valiant and P. Valiant, A CLT and tight lower bounds for estimating entropy, Tech.Report TR10-179, ECCC, 2010.

, Estimating the unseen: A sublinear-sample canonical estimator of distributions,Tech. Report TR10-180, ECCC, 2010.

, Estimating the unseen: an n/ log(n)-sample estimator for entropy and supportsize, shown optimal via new CLTs, Proceedings of STOC, 2011, See also [VV10a] and[VV10b], pp. 685–694.


Date post:	08-Feb-2020
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

Testing probability distributions using conditional...

Documents