Home >
Documents >
Submodular Functions Learnability, Structure & Optimization Nick Harvey, UBC CS Maria-Florina...

Date post: | 01-Jan-2016 |

Category: |
## Documents |

Upload: | laura-west |

View: | 222 times |

Download: | 0 times |

Share this document with a friend

Popular Tags:

37

Transcript

Submodular FunctionsLearnability, Structure & Optimization

Nick Harvey, UBC CSMaria-Florina Balcan, Georgia Tech

OR,Optimization

Machine Learning

AGT,Economics

CS,ApproximationAlgorithms

Who studies submodular functions?

f( ) ! R

Valuation FunctionsA first step in economic modeling:

• individuals have valuation functions givingutility for different outcomes or events.

f( ) ! R• n items, {1,2,…,n} = [n]

• f : 2[n] ! R.

Focus on combinatorial settings:

Valuation FunctionsA first step in economic modeling:

• individuals have valuation functions givingutility for different outcomes or events.

Learning Valuation Functions

This talk: learning valuation functions from past data.

• Package travel deals

• Bundle pricing

Submodular valuations

xS Tx

+

+

Large improvement

Small improvement

For TµS, xS, f(T [ {x}) – f(T) ¸ f(S [ {x}) – f(S)

TS S [ TSÅ T

++ ¸

• Equivalent to decreasing marginal return:

For all S,T µ [n]: f(S)+f(T) ¸ f(S [ T) + f(S Å T)

• [n]={1,…,n}; Function f : 2[n] ! R submodular if

Submodular valuations

• Concave Functions Let h : R ! R be concave. For each S µ [n], let f(S) = h(|S|)

• Vector Spaces Let V={v1,,vn}, each vi 2 Fn.

For each S µ [n], let f(S) = rank({ vi : i 2 S})

E.g.,

xS Tx

+

+

Large improvement

For TµS, xS, f(T [ {x}) – f(T) ¸ f(S [ {x}) – f(S)

Small improvement

• Decreasing marginal return:

S1,…, Sk

Labeled Examples

Passive Supervised Learning

Learning Algorithm

Expert / Oracle

Data Source

Alg. outputs

Distribution D on 2[n]

f : 2[n] ! R+

(S1,f(S1)),…, (Sk,f(Sk))

g : 2[n] ! R+

S1,…, Sk

PMAC model for learning real valued functions

Distribution D on 2[n]

Labeled Examples

Learning Algorithm

Expert / Oracle

Data Source

Alg.outputsf : 2[n] ! R+

g : 2[n] ! R+

(S1,f(S1)),…, (Sk,f(Sk))

• Alg. sees (S1,f(S1)),…, (Sk,f(Sk)), Si i.i.d. from D, produces g

Probably Mostly Approximately Correct

• With probability ¸ 1-±, we have PrS[ g(S) · f(S) · ® g(S) ] ¸ 1-²

PAC Boolean

{0,1}{0,1}

Learning submodular functions

Monotone, submodular functions can be PMAC-learned(w.r.t. an arbitrary distribution) with approximation factor ®=O(n1/2).

Monotone, submodular functions cannot be PMAC-learnedwith approximation factor õ(n1/3).

Theorem: (Our general lower bound)

Theorem: (Our general upper bound)

Lipschitz, monotone submodular funtions can be PMAC-learnedunder a product distribution with approximation factor O(1).

Theorem: (Product distributions)Corollary: Gross substitutes functions do not havea concise, approximate representation.

Learning submodular functions

Monotone, submodular functions can be PMAC-learned(w.r.t. an arbitrary distribution) with approximation factor ®=O(n1/2).

Monotone, submodular functions cannot be PMAC-learnedwith approximation factor õ(n1/3).

Theorem: (Our general lower bound)

Theorem: (Our general upper bound)

Lipschitz, monotone submodular funtions can be PMAC-learnedunder a product distribution with approximation factor O(1).

Theorem: (Product distributions)

Corollary: Gross substitutes functions do not havea concise, approximate representation.

Computing Linear Separators+

– +

+

+

+–

–

–

– +

– +

+

–

– – • Given {+,–}-labeled points in Rn, find a hyperplane cTx

= b that separates the +s and –s.• Easily solved by linear programming.

Learning Linear Separators+

– +

+

+

+–

–

–

– +

– +

+

–

– – • Given random sample of {+,–}-labeled points in Rn,

find a hyperplane cTx = b that separates most ofthe +s and –s.

• Classic machine learning problem.

Error!

Learning Linear Separators+

– +

+

+

+–

–

–

– +

– +

+

–

– – • Classic Theorem: [Vapnik-Chervonenkis 1971?]

O( n/²2 ) samples suffice to get error ².

Error!

~

Submodular Functions are Approximately Linear

• Let f be non-negative, monotone and submodular• Claim: f can be approximated to within factor n

by a linear function g.• Proof Sketch: Let g(S) = §s2S f({s}).

Then f(S) · g(S) · n¢f(S).

Submodularity: f(S)+f(T)¸f(SÅT)+f(S[T) 8S,TµVMonotonicity: f(S)·f(T) 8SµTNon-negativity: f(S)¸0 8SµV

V+ +

+

+

+ +

+ f

n¢f

• Randomly sample {S1,…,Sk} from distribution

• Create + for f(Si) and – for n¢f(Si)• Now just learn a linear separator!

–

––

–

– –

– g

V

f2

n¢f2

• Can improve to O(n1/2): in fact f2 and n¢f2 are separatedby a linear function [Goemans et al. ‘09]

• John’s Ellipsoid theorem: any centrally symmetric convex body is approximated by an ellipsoid to within factor n1/2

g

Learning submodular functions

Monotone, submodular functions can be PMAC-learned(w.r.t. an arbitrary distribution) with approximation factor ®=O(n1/2).

Monotone, submodular functions cannot be PMAC-learnedwith approximation factor õ(n1/3).

Theorem: (Our general lower bound)

Theorem: (Our general upper bound)

Lipschitz, monotone submodular funtions can be PMAC-learnedunder a product distribution with approximation factor O(1).

Theorem: (Product distributions)

Corollary: Gross substitutes functions do not havea concise, approximate representation.

;

V

f(S) =|S| (if |S| · k) k-1 (if S 2 A) k (otherwise)

A1

A2A3

Ak

A = {A1,,Am}, |Ai|=k

Claim: f is submodular if |AiÅAj|·k-2 8ij

;

V

f(S) =|S| (if |S| · k) k-1 (if S 2 A and wasn’t deleted) k (otherwise)

A1

A3

Delete half of the bumps at random.Then f is very unconcentrated on A ) any algorithm to learn f has additive error 1

If algorithm seesonly these examples

Then f can’t bepredicted here

A2

Ak

;

V

A1

A3

Can we force a bigger error with bigger bumps?

Yes, if Ai’s are very “far apart”. This can be achieved by picking them randomly.

Ak

A2

Plan:• Choose two values High=n1/3 and Low=O(log2 n).• Choose random sets A1,…,Am µ [n],

with |Ai|=High and m = nlog n.

• D is the uniform distribution on {A1,…,Am}.• Create a function f : 2[n] ! R.

For each i, randomly set f(Ai)=High or f(Ai)=Low.• Extend f to a monotone, submodular function on 2[n].

There is a distribution D and a randomly chosen function f s.t.• f is monotone, submodular• Knowing the value of f on poly(n) random samples from D

does not suffice to predict the value of f on future samples from D, even to within a factor o(n1/3).

Theorem: (Main lower bound construction)

~

Creating the function f• We choose f to be a matroid rank function– Such functions have a rich combinatorial

structure, and are always submodular• The randomly chosen Ai’s form an expander:

• The expansion property can be leveraged to ensure f(Ai)=High or f(Ai)=Low as desired.

where H = { j : f(Aj) = High }

Learning submodular functions

Monotone, submodular functions cannot be PMAC-learnedwith approximation factor õ(n1/3).

Theorem: (Our general lower bound)

Theorem: (Our general upper bound)

Theorem: (Product distributions)

Corollary: Gross substitutes functions do not havea concise, approximate representation.

Gross Substitutes Functions• Class of utility functions commonly used in

mechanism design [Kelso-Crawford ‘82, Gul-Stacchetti ‘99, Milgrom ‘00, …]

• Intuitively, increasing the prices for some items does not decrease demand for the other items.

• Question: [Blumrosen-Nisan, Bing-Lehman-Milgrom]

Do GS functions have a concise representation?

Gross Substitutes Functions• Class of utility functions commonly used in

mechanism design [Kelso, Crawford, Gul, Stacchetti, …]

• Question: [Blumrosen-Nisan, Bing-Lehman-Milgrom]

Do GS functions have a concise representation?• Fact: Every matroid rank function is GS.

• Corollary: The answer to the question is no.

There is a distribution D and a randomly chosen function f s.t.• f is a matroid rank function• poly(n) bits of information do not suffice to predict the value

of f on samples from D, even to within a factor o(n1/3).

Theorem: (Main lower bound construction)

~

Learning submodular functions

Monotone, submodular functions cannot be PMAC-learnedwith approximation factor õ(n1/3).

Theorem: (Our general lower bound)

Theorem: (Our general upper bound)

Theorem: (Product distributions)

Corollary: Gross substitutes functions do not havea concise, approximate representation.

Learning submodular functions

• Hypotheses:– PrX»D[ X=x ] = i Pr[ Xi = xi ] (“Product distribution”)

– f({i}) 2 [0,1] for all i 2 [n] (“Lipschitz function”)

– f({i}) 2 {0,1} for all i 2 [n] Stronger condition!

Theorem: (Product distributions)

;

V

Technical Theorem:For any ²>0, there exists a concave function h : [0,n] ! R s.t.for every k2[n], and for a 1-² fraction of SµV with |S|=k,we have:

In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k.h(k) · f(S) · O(log2(1/²))¢h(k).

Technical Theorem:For any ²>0, there exists a concave function h : [0,n] ! R s.t.for every k2[n], and for a 1-² fraction of SµV with |S|=k,we have:

In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k.

Algorithm:• Let ¹ = §i=1 f(xi) / m• Let g be the constant function with value ¹

This achieves approximation factor O(log2(1/²)) ona 1-² fraction of points, with high probability.

h(k) · f(S) · O(log2(1/²))¢h(k).

Theorem: (Product distributions)

m

Technical Theorem:For any ²>0, there exists a concave function h : [0,n] ! R s.t.for every k2[n], and for a 1-² fraction of SµV with |S|=k,we have:

In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k.

Concentration Lemma: Let X have a product distribution. For any ® 2 [0,1],

Proof: Based on Talagrand’s concentration inequality.

h(k) · f(S) · O(log2(1/²))¢h(k).

Follow-up work

• Subadditive & XOS functions [Badanidiyuru et al., Balcan et al.]– O(n1/2) approximation– (n1/2) inapproximability

• Symmetric submodular functions [Balcan et al.]– O(n1/2) approximation– (n1/3) inapproximability

Conclusions• Learning-theoretic view of submodular fns• Structural properties:– Very “bumpy” under arbitrary distributions– Very “smooth” under product distributions

• Learnability in PMAC model:– O(n1/2) approximation algorithm– (n1/3) inapproximability– O(1) approx for Lipschitz fns & product distrs

• No concise representation for gross substitutes

Recommended