Smooth Boolean Functions are Easy: Efficient Algorithms for Low-Sensitivity Functions Rocco Servedio...

Post on 18-Jan-2018

223 views 0 download

description

“Complexity” and “Boolean functions”  Certificate complexity  Decision tree depth (deterministic, randomized, quantum)  Sensitivity  Block sensitivity  PRAM complexity  Real Polynomial degree (exact, approximate)  … Complexity measures: combinatorial/analytic ways to “get a handle on how complicated f is” All lie in {0,1,…,n} for n-variable Boolean functions.

transcript

Smooth Boolean Functions are Easy:Efficient Algorithms for Low-Sensitivity Functions

Rocco Servedio

Joint work with

Parikshit Gopalan (MSR)Noam Nisan (MSR / Hebrew University)

Kunal Talwar (Google)Avi Wigderson (IAS)

ITCS 2016

The star of our show:

f: {0,1}n {0,1}

“Complexity” and “Boolean functions”

Certificate complexity

Decision tree depth (deterministic, randomized, quantum)

Sensitivity

Block sensitivity

PRAM complexity

Real Polynomial degree (exact, approximate)

Complexity measures: combinatorial/analytic ways to “get a handle on how complicated f is”

All lie in {0,1,…,n} for n-variable Boolean functions.

“Complexity” and “Boolean functions” revisited

Unrestricted circuit size

Unrestricted formula size

AC0 circuit size

DNF size

Complexity classes: computational ways to “get a handle on how complicated f is”

All lie in {0,1,…,2n} for n-variable Boolean functions.

High-level summary of this work:

A computational perspective on a classic open question about complexity measures of Boolean functions….

...namely, the sensitivity conjecture of [NisanSzegedy92].

Background: Complexity measures

Certificate complexity

Decision tree depth (deterministic, randomized, quantum)

Block sensitivity

PRAM complexity

Real Polynomial degree (exact, approximate)

Fundamental result(s) in Boolean function complexity:

For any Boolean function f, the above complexity measures are all polynomially related to each other.

Examples: DT-depth and real degree

• DT-depth(f) = minimum depth of any decision tree computing f

• degR(f) = degree of the unique real multilinear polynomial computing f: {0,1}n {0,1}

0 1 0 1 1

1

0 1

DT-depth is 4

DT-depth and real degree are polynomially related

_ _

0 1 0 1 1

1

0 1

(Lower bound is trivial: for each 1-leaf at depth d, have degree-d polynomial outputing 1 iff its input reaches that leaf, else 0. Sum these.)

Polynomial for this leaf: x1x2(1-x4)x3

Theorem: [NisanSzegedy92,NisanSmolensky,Midrijanis04] For any Boolean function f,

degR(f) < DT-depth(f) < 2degR(f)3.

An outlier among complexity measures: Sensitivity

• s(f,x) = sensitivity of f at x = number of neighbors y of x such that f(x) = f(y)

• s(f) = (max) sensitivity of f = max of s(f,x) over all x in {0,1}n

Folklore: s(f) < DT-depth(f)._

_Question: [Nisan91,NisanSzegedy92]

Is DT-depth(f) < poly(s(f))?

0 1 0 1 11

0 1

/

The sensitivity conjecture

_Conjecture: DT-depth(f) < poly(s(f)).

Despite much effort, best known upper bounds are exponentialin sensitivity:• [Simon82]: # relevant variables < s(f)4s(f)

• [KenyonKutin04]: bs(f) < es(f)

• [AmbainisBavarianGaoMaoSunZuo14]: bs(f),C(f) < s(f)2s(f)-1, deg(f) < 2s(f)(1+o(1))

• [AmbainisPrusisVihrovs15]: bs(f) < (s(f)-1/3)2s(f)-1

Equivalently, block sensitivity, certificate complexity, real degree, approximate degree, randomized/quantum DT-depth…

_

_

_ _

_

This work: Computational view on the sensitivity conjecture

Previous approaches: combinatorial/analyticBut conjecture also is a strong computational statement: low-sensitivity functions are very easy to compute!

_Conjecture: DT-depth(f) < poly(s(f)).

Prior to this work, evenseems not to have been known. In fact, even an upper bound on the # of sensitivity-s functions seems not to have been known.

Conjecture implies

_Conjecture: DT-depth(f) < poly(s(f)).

Results

Theorem: Every n-variable sensitivity-s function is computed by a Boolean circuit of size nO(s).In fact, every such function is computed by a Boolean formula of depth O(s log(n)).

So now the picture is

?

?

?

?

Results (continued)

Theorem: Any n-variable sensitivity-s function can be self-corrected from 2-cs-fraction of worst-case errors using nO(s) queries and runtime.

(Conjecture low-sensitivity f has low degR has low deg2

has self-corrector)

All results are fairly easy. (Lots of directions for future work!)

Circuit/formula size bounds are consequences of the conjecture.Another consequence of the conjecture:

Simple but crucial insight

Fact: If f has sensitivity s, then f(x) is completely determined once you know f’s value on 2s+1 neighbors of x.

… …………..…

x

neighbors where f=0

neighbors where f=1

2s+1 neighbors

Either have at least s+1 many 0-neighbors or at least s+1 many 1-neighbors.The value of f(x) must equal this majority value!(If it disagreed, would have s(f) > s(f,x) > s+1.)

f(x)=1

_ _

f(x)=?

Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s.

0n

1n

weight levels0,…,2s

?weight level2s+1; each point here has 2s+1 down-neighbors

Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s.

0n

1n

weight levels0,…,2s

weight level2s+1; each point here has 2s+1 down-neighbors

Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s.

0n

1n

weight levels0,…,2s

?weight level2s+1; each point here has 2s+1 down-neighbors

Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s.

0n

1n

weight levels0,…,2s

weight level2s+1; each point here has 2s+1 down-neighbors

Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s.

0n

1n

weight levels0,…,2s

weight level2s+1; each point here has 2s+1 down-neighbors

?

Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s.

0n

1n

weight levels0,…,2s

weight level2s+1; each point here has 2s+1 down-neighbors

Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s.

0n

1n

weight level2s+1; each point here has 2s+1 down-neighbors

etc;

becomes

weight levels0,…,2s

Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s.

0n

1n

weight levels0,…,2s+1

weight level2s+1; each point here has 2s+1 down-neighbors

etc;

becomes

0n

1n

weight levels0,…,2s

Fill in all of {0,1}n this way, level by level.

Corollary: There are at most 2{n choose <2s} sensitivity-s functions over {0,1}n.

_

Can we use this insight to compute

sensitivity-s functions efficiently?

Small circuits for sensitivity-s functions

Theorem: Every n-variable sensitivity-s function has a circuit of size O(sn2s+1).

Algorithm has value of f on bottom 2s+1 layers “hard-coded” in. Bottom 2s+1 layers Hamming ball centered at 0n.

Algorithm: For |x| stages,• Shift center of Hamming ball along

shortest path to x

• Use values of f on previous Hamming ball to compute values on new ball (at most n2s new values to compute; each one easy using majority vote)

x

0n = first center next center

Compute at most n2s new values

Small circuits for sensitivity-s functions

Theorem: Every n-variable sensitivity-s function has a circuit of size O(sn2s+1).

Algorithm has value of f on bottom 2s+1 layers “hard-coded” in. Bottom 2s+1 layers Hamming ball centered at 0n.

Algorithm: For |x| stages,• Shift center of Hamming ball along

shortest path to x

• Use values of f on previous Hamming ball to compute values on new ball (at most n2s new values to compute; each one easy using majority vote)

x

next center

Compute at most n2s new values

Small circuits for sensitivity-s functions

Theorem: Every n-variable sensitivity-s function has a circuit of size O(sn2s+1).

Algorithm has value of f on bottom 2s+1 layers “hard-coded” in. Bottom 2s+1 layers Hamming ball centered at 0n.

Algorithm: For |x| stages,• Shift center of Hamming ball along

shortest path to x

• Use values of f on previous Hamming ball to compute values on new ball (at most n2s new values to compute; each one easy using majority vote)

x

next center

Compute at most n2s new values

Small circuits for sensitivity-s functions

Theorem: Every n-variable sensitivity-s function has a circuit of size O(sn2s+1).

Algorithm has value of f on bottom 2s+1 layers “hard-coded” in. Bottom 2s+1 layers Hamming ball centered at 0n.

Algorithm: For |x| stages,• Shift center of Hamming ball along

shortest path to x

• Use values of f on previous Hamming ball to compute values on new ball (at most n2s new values to compute; each one easy using majority vote)

x

Shallow circuits for sensitivity-s functions?

The algorithm we just saw seems inherently sequential – takes n stages.

Can we parallelize?

Yes, by being bolder: go n/s levels at each stage rather than one.

Extension of earlier key insight

Sensitivity-s functions are noise-stable at every input x.

• Pick any vertex x. • Flip n/(11s) random coordinates to get y.

• View t-th coordinate flipped as chosen from ‘untouched’ n-t+1 coordinates.At each stage, at most s coordinates are sensitive.Get Pr[f(x) = f(y)] < Pr[stage 1 flips f] + Pr[stage 2 flips f] + … < s/n + s/(n-1) + … + s/(n – n/11s + 1)

< 1/10.

___

/

Downward walks

Similar statement holds for “random downward walks.”

• Pick any vertex x with |x| many ones. • Flip |x|/(11s) randomly chosen 1’s to 0’s to get y.

• View t-th coordinate flipped as chosen from ‘untouched’ |x|-t+1 coords.Get Pr[f(x) = f(y)] < s/|x| + s/(|x|-1) + … + s/(|x| – |x|/11s + 1)

< 1/10._

_/

Shallow circuits for sensitivity-s functions

Theorem: Every n-variable sensitivity-s function has a formula of depth O(s log n).

Algorithm has value of f on bottom 11s layers “hard-coded” in.

Parallel-Alg: Given x• If |x| < 11s, return hard-coded value.

• Sample C=O(1) points x1, x2, xC from “downward random walk” of length |x|/11s. Call Parallel-Alg on each one.

• Return majority vote of the C results.

x

……..x1 x2 xC

0n

weight levels0,…,11s

_

Algorithm has value of f on bottom 10s layers “hard-coded” in.

Parallel-Alg: Given x• If |x| < 11s, return hard-coded value.

• Sample C=O(1) points x1, x2, xC from “downward random walk” of length |x|/11s. Call Parallel-Alg on each one.

• Return majority vote of the C results.

x

…....x1 x2 xC

0n

• Have Parallel-Alg(x) = f(x) with probability 19/20 for all x(proof: induction on |x|)

• After O(s log n) stages, bottoms out in “red zone”, so parallel runtime is O(s log n)

• C=O(1), so total work is CO(s log n) = nO(s)

weight levels0,…,11s

_

Conclusion / Questions

Many questions remain about computational properties of low-sensitivity functions.

We saw there are at most 2{n choose <2s} many sensitivity-s functions.

Can this bound be sharpened?

We saw every sensitivity-s function has a formula of depth O(s log n).

Does every such function have a

• TC0 circuit / AC0 circuit / DNF / decision tree of size npoly(s)?

• PTF of degree poly(s)? DNF of width poly(s)? GF(2) polynomial of degree poly(s)?

A closing puzzle/request

We saw sensitivity-s functions obey a “majority rule” (MAJ of any 2s+1 neighbors).

Well-known that degree-d functions obey a “parity rule” (PAR over any (d+1)-dim subcube must = 0).

If the conjecture is true, then low-sensitivity functions have low degree…

…and these two very different-looking rules must coincide!

Explain this!

Thank you for your attention

36