McDiarmid’s Inequality - BGUatml192/wiki.files/ashish-mcdiarmid.pdf · McDiarmid’s Inequality...

McDiarmid’s InequalityAshish Rastogi

Motivation

• Generalization bounds:

• capacity measures [covering numbers, Rademacher complexity, VC theory]

• stability-based bounds

• Applications:

• chromatic number

2

McDiarmid’s Inequality

• Theorem: Let be independent random variables all taking values in the set . Further, let be a function of that satisfies

Then for all ,

• Corollary: For , , .

3

X1, . . . , Xm

X f : Xm

!" R

X1, . . . , Xm !i,!x1, . . . , xm, x!

i " X ,

|f(x1, . . . , xi, . . . , xm) ! f(x1, . . . , x!

i, . . . , xm)| " ci.

Pr [f ! E[f ] " !] # exp

!

!2!2"m

i=1c2i

#

.

! > 0

Xi ! [ai, bi] f =1

m

!m

i=1Xi ci =

bi!ai

m

Pr [f ! E[f ] " !] # exp!

!2!2m

2

P

m

i=1(bi!ai)2

"

.

Hoeffding’s Inequality

Proof Elements

• Markov’s Inequality: For a non-negative random variable ,

• Proof:

4

X

Pr[X ! t] " E[X]t

E[X] =!

x

xPr[X = x]

!!

x!t

xPr[X = x]

! t

!

x!t

Pr[X = x]

= t Pr[X ! t].

Law of Iterated Expectation

• For random variables :

• Proof: follows from definitions.

• Idea: taking expectation conditioning over and then taking expectation over values of is the same as taking the expectation all at once.

5

E[E[X|Y, Z]|Z] = E[X|Z]

X, Y, Z

Y

Y

Proof Elements

• Hoeffding’s Lemma: Let be a random variable with and . Then for ,

• Proof: Convexity and Taylor’s Theorem (do on the board).

6

X

a ! X ! b t > 0

E[X] = 0

E[etX ] ! exp!

t2(b!a)2

8

"

.

-0.8 -0.4 0 0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 4 4.4 4.8

10

20

30

40

50

60

70

x = a x = b

exp (tx)

Hoeffding’s Lemma

• Convexity implies:

• Expectation on both sides:

• Set

• Observe

7

-0.8 -0.4 0 0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 4 4.4 4.8

10

20

30

40

50

60

70

x = a x = b

exp (tx)

etx ! b!x

b!aeta +

x!a

b!aetb

E[etx] ! b

b!aeta " a

b!aetb

e!(t)

:=b

b!aeta

!

ab!a

etb

!(0) = 0, !!(0) = 0, !!!(t) ! (b"a)2

4 .

McDiarmid’s Inequality

• Theorem: Let be independent random variables all taking values in the set . Further, let be a function of that satisfies

Then for all ,

• Proof: Let be the sequence of random variables Define random variables . Observe that .

8

X1, . . . , Xm

X f : Xm

!" R

X1, . . . , Xm !i,!x1, . . . , xm, x!

i " X ,

|f(x1, . . . , xi, . . . , xm) ! f(x1, . . . , x!

i, . . . , xm)| " ci.

Pr [f ! E[f ] " !] # exp

!

!2!2"m

i=1c2i

#

.

! > 0

Xi1

X1, . . . , Xi

Zi = E[f(X) | Xi1]

Z0 = E[f ], Zm = f(X)

Proof continued

• Consider the random variable

• Observation 1:

• Observation 2:

• Let .

• Let .

• Note that .

• Finally, .

• Thus,

9

Zi ! Zi!1 | Xi!1

1

E[Zi ! Zi!1 | Xi!1

1] = 0.

Ui = supu{E[f | Xi!1

1, u] ! E[f | X

i!1

1]}

Li = inf l{E[f | Xi!1

1, l] ! E[f | X

i!1

1]}

Li ! (Zi " Zi!1) | Xi!1

1! Ui

Ui ! Li " ci

E[et(Zi!Zi!1) | Xi!11 ] ! e

t2

c2i

8 .

Proof continued

10

Pr [f ! E[f ] " !] = Pr!

et(f!E[f ]) " et!"

# e!t!E

!

et(f!E[f ])"

= e!t!E

!

etP

m

i=1(Zi!Zi!1)

"

= e!t!E

!

E[etP

m

i=1(Zi!Zi!1)|Xm!1

1 ]"

= e!t!E

!

etP

m!1

i=1(Zi!Zi!1)E[et(Zm!Zm!1)|Xm!1

1 ]"

# e!t!et2

c2m

8 E

!

etP

m!1

i=1(Zi!Zi!1)

"

Thus, Pr[f ! E[f ] " !] # exp!

!t! + t2

8

"m

i=1c2i

#

Markov’s Inequality

Telescoping

Iterative Expectation

Proof continued

• Choose that minimizes .

• This leads to .

• And therefore, .

• Thus,

11

!t! +t2

8

!m

i=1c2it

t =4!

P

m

i=1c2

i

!t! +t2

8

!m

i=1c2i

=!2!

2

P

m

i=1c2

i

Pr[f ! E[f ] " !] # exp!

!2!2

P

m

i=1c2

i

"

.

Stability of an Algorithm• Idea: small change in training set small change in hypothesis.

• “Sufficient” stability leads to generalization (McDiarmid’s ineq.)

• Advantage: algorithm specific, analysis independent of any capacity term.

12

Training set S, produces hS

Training set S’ produces hS’

Definition: When and differ in exactly one point, then for all

-stability!

S S!

!x " X ,

|c(hS , x) ! c(hS! , x)| " !.

(!)

Ingredients of a Generalization Bound

• Errors:

• test error:

• training error:

• Shape of the generalization bound:

• Key step: for a hypothesis , deriving a bound on

13

R(h, S) ! !R(h, S) + stability-dependent terms.

h

PrS!X

!|R(h, S) ! "R(h, S)| " !

#.

R(h, S) = Ex!D[c(hS , x)]

!R(h, S) =1

m

m"

i=1

c(hS , xi)

From Stability to Generalization

• Apply McDiarmid’s inequality to the random variable:

• Need to bound:

• for and differing in one point,

• the expectation,

• Let be a -stable learning algorithm with respect to a cost-function and the cost-function is bounded, i.e. for some . Then,

•

•

14

f(S) = R(h, S) ! !R(h, S)

S S! |f(S) ! f(S!)|.

ES!Dm [f(S)].

A !c c !x " X ,

!h " H, c(h, x) # M M > 0

E[f(S)] ! !

|f(S) ! f(S!)| " 2! +M

m

Generalization Bound

• Applying McDiarmid’s Inequality leads to, for all

• Or,

• Note that for effective bound, need

• With confidence

15

! > 0,

Pr[R(h, S) ! !R(h, S) ! ! " "] # exp

"!2"2

m(2! + M

m)2

#

Pr[R(h, S) ! !R(h, S) " ! + "] # exp

"!2"2m

(2!m + M)2

#

! = o(1/!

m).

1 ! !,

R(h, S) ! !R(h, S) + ! + (2!m + M)

"ln(1/")

2m.

Determining

• Consider regularization-based objective function:

• Need two technical definitions / observations:

• -admissibility:

• Bounded kernel:

16

!

F (g, S) = ||g||2K +C

m

m!

i=1

c(g, xi).

!

|c(h!, x) ! c(h, x)| " !|(h! ! h)(x)|.

!h, h!" H,!x " X ,

!x " X , K(x, x) ! !.

Determining

• Consider regularization-based objective function:

• Consider two sets, and such thatwhere

• Let

• is convex in Let

• Thus, and

• This leads to:

17

!

F (g, S) = ||g||2K +C

m

m!

i=1

c(g, xi).

S S!

S!= S \ {xi} ! {x!

i}xi ! S.

h = arg ming

F (g, S), h! = arg ming

F (g, S!).

F (g, S) g.

F (h, S) ! F (h + t!h, S) " 0,

F (h, S!) ! F (h! ! t!h, S!) " 0.

||h||2K ! ||h + t!h||2K + ||h!||2K ! ||h! ! t!h||2K "2t!"C||!h||K

m.

!h = h!! h.

Determining

• Finally, observe that in an RHKS:

• Put the pieces together to derive a bound.

18

!

||h||2K ! ||h + t!h||2K + ||h!||2K ! ||h! ! t!h||2K = 2t(1 ! t)||!h||2K

Application - Chromatic Number

• Random Graph: Given number of vertices and an edge probability , define as a random graph with:

• vertices

• edges (random) as with probability

• Chromatic number: min. number of colors to color the vertices of a graph s.t. adjacent vertices colored differently.

• Notation: Let be the chromatic number of

• Vertex exposure martingale: sequence of random variables given the edges between the first vertices.

19

n

p G(n, p)

{1, . . . , n}.

!i, j, (i, j) " EE p.

!(G) G.

Zk = E[w(G) | E! ! E, (i, j) " E! # (i, j) " E $ i, j % k]

Zk, 1 ! k ! n, k

Chromatic Number

• Observation 1:

• Observation 2:

• Using , and settingeasy to show:

• Notes:

• determining the chromatic number is NP-hard.

• finding a -coloring given that is also NP-hard.

• there’s more sophisticated analyses of for random

20

Z0 = E[w(G)], Zn = w(G).

|Zk ! Zk!1| " 1, 1 " k " n.

Zn ! Z0 =n!

k=1

(Zk ! Zk!1) ! = "!

n,

Pr

!

1!

n(!(G) " E[!(G)]) # "

"

$ e!2!

2

.

!(G) = kk

!(G) G.

Conclusion

• The condition to apply McDiarmid’s inequality is relatively simple to verify.

• Provides an easy way of deriving generalization bounds.

21

References

• Kazuoki Azuma. Weighted sums of certain dependent random variables. In Tohoku Mathematical Journal, volume 19, pages 357–367, 1967.

• Olivier Bousquet and Andre Elisseeff. Stability and generalization. Journal of Machine Learning Research, 2:499–526, 2002.

• Colin McDiarmid. On the method of bounded differences. In Surveys in Combinatorics, pages 148–188. Cambridge University Press, Cambridge, 1989.

• N. Alon and J. H. Spencer, The Probabilistic Method, Wiley, New York, 1992.

22

Date post:	01-Jun-2020
Category:	Documents
Upload:	others
View:	5 times
Download:	0 times

McDiarmid’s Inequality - BGUatml192/wiki.files/ashish-mcdiarmid.pdf · McDiarmid’s Inequality...

Documents