+ All Categories
Home > Documents > Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick...

Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick...

Date post: 30-Aug-2019
Category:
Upload: others
View: 18 times
Download: 0 times
Share this document with a friend
22
Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete probability theory. We note from the outset that some of the definitions given here are no longer correct in the setting of continuous probability theory. Let be a finite or countably infinite set, and let 2 denote the set of subsets of . An element A 2 2 is simply a subset of , but in the language of probability it is called an event. A probability measure on is a function P W 2 ! Œ0;1Ł satisfying P.;/ D 0;P./ D 1 and which is -additive; that is, for any 1 N 1, one has P.[ N nD1 A n / D P N nD1 P.A n /, whenever the events fA n g N nD1 are disjoint. From this -additivity, it follows that P is uniquely determined by fP.fxg/g x2 . Using the -additivity on disjoint events, it is not hard to prove that P is -sub-additive on arbitrary events; that is, P.[ N nD1 A n / P N nD1 P.A n /, for arbitrary events fA n g N nD1 . See Exercise A.1. The pair .;P/ is called a probability space. If C and D are events and P.C/>0, then the conditional probability of D given C is denoted by P.DjC/ and is defined by P.DjC/ D P.C \ D/ P.C/ : Note that P.jC/ is itself a probability measure on . Two events C and D are called independent if P.C \ D/ D P.C/P.D/. Clearly then, C and D are independent if either P.C/ D 0 or P.D/ D 0. If P.C/;P.D/ > 0, it is easy to check that independence is equivalent to either of the following two equalities: P.DjC/ D P.D/ or P.C jD/ D P.C/. Consider a collection fC n g N nD1 of events, with 1 N 1. This collection of events is said to be independent if for any finite subset fC n j g m j D1 of the events, one has P.\ m j D1 C n j / D Q m j D1 P.C n j /. Let .;P/ be a probability space. A function X W ! R is called a (discrete, real-valued) random variable. For B R, we write fX 2 B g to denote the event X 1 .B/ Df! 2 W X.!/ 2 B g, the inverse image of B . When considering the probability of the event fX 2 B g or the event fX D xg, we write P.X 2 B/ or P.X D x/, instead of P.fX 2 B g/ or P.fX D xg/. The distribution of the random R.G. Pinsky, Problems from the Discrete to the Continuous, Universitext, DOI 10.1007/978-3-319-07965-3, © Springer International Publishing Switzerland 2014 133
Transcript
Page 1: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

Appendix AA Quick Primer on Discrete Probability

In this appendix, we develop some basic ideas in discrete probability theory. Wenote from the outset that some of the definitions given here are no longer correct inthe setting of continuous probability theory.

Let � be a finite or countably infinite set, and let 2� denote the set of subsetsof�. An element A 2 2� is simply a subset of�, but in the language of probabilityit is called an event. A probability measure on � is a function P W 2� ! Œ0; 1�

satisfying P.;/ D 0; P.�/ D 1 and which is � -additive; that is, for any 1 �N � 1, one has P.[N

nD1An/ D PNnD1 P.An/, whenever the events fAngNnD1

are disjoint. From this � -additivity, it follows that P is uniquely determined byfP.fxg/gx2�. Using the � -additivity on disjoint events, it is not hard to prove thatP is � -sub-additive on arbitrary events; that is, P.[N

nD1An/ � PNnD1 P.An/, for

arbitrary events fAngNnD1. See Exercise A.1. The pair .�;P / is called a probabilityspace.

If C and D are events and P.C/ > 0, then the conditional probability of Dgiven C is denoted by P.DjC/ and is defined by

P.DjC/ D P.C \D/P.C /

:

Note that P.� jC/ is itself a probability measure on �. Two events C and D arecalled independent if P.C \ D/ D P.C/P.D/. Clearly then, C and D areindependent if either P.C/ D 0 or P.D/ D 0. If P.C/; P.D/ > 0, it is easyto check that independence is equivalent to either of the following two equalities:P.DjC/ D P.D/ or P.C jD/ D P.C/. Consider a collection fCngNnD1 of events,with 1 � N � 1. This collection of events is said to be independent if for anyfinite subset fCnj gmjD1 of the events, one has P.\m

jD1Cnj / D QmjD1 P.Cnj /.

Let .�;P / be a probability space. A function X W � ! R is called a (discrete,real-valued) random variable. For B � R, we write fX 2 Bg to denote the eventX�1.B/ D f! 2 � W X.!/ 2 Bg, the inverse image of B . When considering theprobability of the event fX 2 Bg or the event fX D xg, we write P.X 2 B/ orP.X D x/, instead of P.fX 2 Bg/ or P.fX D xg/. The distribution of the random

R.G. Pinsky, Problems from the Discrete to the Continuous, Universitext,DOI 10.1007/978-3-319-07965-3, © Springer International Publishing Switzerland 2014

133

Page 2: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

134 A Primer on Discrete Probability

variable X is the probability measure �X on R defined by �X.B/ D P.X 2 B/,for B � R. The function pX.x/ WD P.X D x/ is called the probability function orthe discrete density function for X .

The expected value or expectation EX of a random variable X is defined by

EX DX

x2Rx P.X D x/ D

X

x2Rx pX.x/; if

X

x2RjxjP.X D x/ < 1:

Note that the set of x 2 R for which P.X D x/ > 0 is either finite or countablyinfinite; thus, these summations are well defined. We frequently denote EX by �. IfP.X � 0/ D 1 and the condition above in the definition of EX does not hold, thenwe write EX D 1. In the sequel, when we say that the expectation of X “exists,”we mean that

Px2R jxjP.X D x/ < 1.

Given a function W R ! R and a random variable X , we can define a newrandom variable Y D .X/. One can calculate EY according to the definition ofexpectation above or in the following equivalent way:

EY DX

x2R .x/P.X D x/; if

X

x2Rj .x/jP.X D x/ < 1:

For n 2 N, the nth moment of X is defined by

EXn DX

x2RxnP.X D x/; if

X

x2RjxjnP.X D x/ < 1:

If � D EX exists, then one defines the variance of X , denoted by �2 or �2.X/ orVar.X/, by

�2 D E.X � �/2 DX

x2R.x � �/2P.X D x/:

Of course, it is possible to have �2 D 1. It is easy to check that

�2.X/ D EX2 � �2: (A.1)

Chebyshev’s inequality is a fundamental inequality involving the expected valueand the variance.

Proposition A.1 (Chebyshev’s Inequality). Let X be a random variable withexpectation � and finite variance �2. Then for all � > 0,

P.jX � �j � �/ � �2

�2:

Page 3: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

A Primer on Discrete Probability 135

Proof.

P.jX � �j � �/ DX

x2RWjx��j��P.X D x/ �

X

x2RWjx��j��

.x � �/2�2

P.X D x/ �

X

x2R

.x � �/2�2

P.X D x/ D �2

�2:

Let fXj gnjD1 be a finite collection of random variables on a probability space.�;P /. We call X D .X1; : : : ; Xn/ a random vector. The joint probability functionof these random variables, or equivalently, the probability function of the randomvector, is given by

pX.x/ D pX.x1; : : : ; xn/ WD P.X1 D x1; : : : ; Xn D xn/ D P.X D x/;

xi 2 R; i D 1; : : : ; n; where x D .x1; : : : ; xn/:

It follows thatP

j2Œn��figP

xj2R pX.x/ D P.Xi D xi /. For any function H WRn ! R, we define

EH.X/ DX

x2RnH.x/pX.x/; if

X

x2RnjH.x/jpX.x/ < 1:

In particular then, if EXj exists, it can be written as EXj D Px2Rn xj pX.x/.

Similarly, if EXk exists, for all k, then we have

E

nX

kD1ckXk D

X

x2Rn.

nX

kD1ckxk/pX.x/ D

nX

kD1ck� X

x2RnxkpX.x/

�:

It follows from this that the expectation is linear; that is, if EXk exists for k D1; : : : ; n, then

E

nX

kD1ckXk D

nX

kD1ckEXk;

for any real numbers fckgnkD1.Let fXj gNjD1 be a collection of random variables on a probability space .�;P /,

where 1 � N � 1. The random variables are called independent if for every finiten � N , one has

Page 4: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

136 A Primer on Discrete Probability

P.X1 D x1;X2 D x2; : : : ; Xn D xn/ DnY

jD1P.Xj D xj /;

for all xj 2 R; j D 1; 2; : : : ; n:

Let ffi gniD1 be real-valued functions with fi defined at least on the set fx 2 R WP.Xi D x/ > 0g. Assume that Ejfi .Xi /j < 1, for i D 1; : : : ; n. From thedefinition of independence it is easy to show that if fXj gnjD1 are independent, then

E

nY

iD1fi .Xi / D

nY

iD1Efi .Xi /: (A.2)

The variance is of course not linear. However the variance of a sum of independentrandom variables is equal to the sum of the variances of the random variables:

If fXigniD1 are independent random variables, then

�2.

nX

iD1Xi / D

nX

iD1�2.Xi /: (A.3)

It suffices to prove (A.3) for n D 2 and then use induction. Let �i D EXi , i D 1; 2.We have

�2.X1 CX2/ D E�X1 CX2 �E.X1 CX2/

�2 D E�.X1 � �1/C .X2 � �2/

�2 DE.X1 � �1/2 CE.X2 � �2/2 C 2E.X1 � �1/.X2 � �2/ D �2.X1/C �2.X2/;

where the last equality follows because (A.2) shows that E.X1 � �1/.X2 � �2/ DE.X1 � �1/E.X2 � �2/ D 0.

Chebyshev’s inequality and (A.3) allow for an exceedingly short proof ofan important result—the weak law of large numbers for sums of independent,identically distributed (IID) random variables.

Theorem A.1. Let fXng1nD1 be a sequence of independent, identically distributed

random variables and assume that their common variance �2 is finite. Denote theircommon expectation by �. Let Sn D Pn

jD1 Xj . Then for any � > 0,

limn!1P.jSn

n� �j � �/ D 0:

Proof. We have ESn D n�, and since the random variables are independent andidentically distributed, it follows from (A.3) that �2.Sn/ D n�2. Now applyingChebyshev’s inequality to Sn with � D n� gives

Page 5: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

A Primer on Discrete Probability 137

P.jSn � n�j � n�/ � n�2

.n�/2;

which proves the theorem. �

Remark. The weak law of large numbers is a first moment result. It holds evenwithout the finite variance assumption, but the proof is much more involved.

The above weak law of large numbers is actually a particular case of thefollowing weak law of large numbers.

Proposition A.2. Let fYng1nD1 be random variables. Assume that

�2.Yn/ D o�.EYn/

2�; as n ! 1:

Then for any � > 0,

limn!1P.j Yn

EYn� 1j � �/ D 0:

Proof. By Chebyshev’s inequality, we have

P.jYn �EYnj � �jEYnj/ � �2.Yn/��EYn

�2 :

If X and Y are random variables on a probability space .�;P /, and ifP.XDx/>0, then the conditional probability function of Y given X D x isdefined by

pY jX.yjx/ WD P.Y D yjX D x/ D P.X D x; Y D y/

P.X D x/:

The conditional expectation of Y given X D x is defined by

E.Y jX D x/ DX

y2Ry P.Y D yjX D x/ D

X

y2Ry pY jX.yjx/;

ifX

y2RjyjP.Y D yjX D x/ < 1:

It is easy to verify that

EY DX

x2RE.Y jX D x/P.X D x/;

where E.Y jX D x/P.X D x/ WD 0, if P.X D x/ D 0.

Page 6: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

138 A Primer on Discrete Probability

A random variable X that takes on only two values—0 and 1, with P.X D1/ D p and P.X D 0/ D 1 � p, for some p 2 Œ0; 1�—is called a Bernoullirandom variable. One writes X � Ber.p/. It is trivial to check that EX D p and�2.X/ D p.1 � p/.

Let n 2 N and let p 2 Œ0; 1�. A random variable X satisfying

P.X D j / D n

j

!

pj .1 � p/n�j ; j D 0; 1; : : : ; n;

is called a binomial random variable, and one writes X � Bin.n; p/. The randomvariable X can be thought of as the number of “successes” in n independent trials,where on each trial there are two possible outcomes—“success” and “failure”—and the probability of “success” is p on each trial. Letting fZigniD1 be independent,identically distributed random variables distributed according to Ber.p/, it followsthat X can be realized as X D Pn

iD1 Zi . From the formula for the expectedvalue and variance of a Bernoulli random variable, and from the linearity of theexpectation and (A.3), the above representation immediately yields EX D np and�2.X/ D np.1 � p/.

A random variable X satisfying

P.X D n/ D e�� �n

nŠ; n D 0; 1; : : : ;

where � > 0, is called a Poisson random variable, and one writes X � Pois.�/.One can check easily that EX D � and �2.X/ D �.

Proposition A.3 (Poisson Approximation to the Binomial Distribution). Forn 2 N and p 2 Œ0; 1�, let Xn;p � Bin.n; p/. For � > 0, let X� � Pois.�/. Then

limn!1;p!0;np!�

P.Xn;p D j / D P.X� D j /; j D 0; 1; : : : : (A.4)

Proof. By assumption, we have p D �nn

, where limn!1 �n D �. We have

P.Xn;pDj /D n

j

!

pj .1�p/n�jDn.n � 1/ � � � .n � j C 1/

j Š.�n

n/j .1 � �n

n/n�jD

1

j Š�jnn.n � 1/ � � � .n � j C 1/

nj.1 � �n

n/n�j I

thus,

limn!1;p!0;np!�

P.Xn;p D j / D e�� �j

j ŠD P.X� D j /:

Page 7: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

A Primer on Discrete Probability 139

Equation (A.4) is an example of weak convergence of random variables or dis-tributions. In general, if fXng1

nD1 are random variables with distributions f�Xng1nD1,

and X is a random variable with distribution �X , then we say that Xn convergesweakly to X , or �Xn converges weakly to �X , if limn!1 P.Xn � x/ DP.X � x/, for all x 2 R for which P.X D x/ D 0, or equivalently, iflimn!1 �Xn..�1; x�/ D �X..�1; x�/, for all x 2 R for which �X.fxg/ D 0.Thus, for example, if P.Xn D 1

n/ D P.Xn D 1 C 1

n/ D 1

2, for n D 1; 2; � � � ,

and P.X D 0/ D P.X D 1/ D 12, then Xn converges weakly to X since

limn!1 P.Xn � x/ D P.X � x/, for all x 2 R � f0; 1g. See also Exercise A.4.

Exercise A.1. Use the � -additivity property of probability measures on disjoint setsto prove � -sub-additivity on arbitrary sets: that is, P.[N

nD1An/ � PNnD1 P.An/, for

arbitrary events fAngNnD1, where 1 � N � 1. (Hint: Rewrite [NnD1An as a disjoint

union [NnD1Bn, by letting B1 D A1;B2 D A2 � A1;B3 D A3 � A2 � A1, etc.)

Exercise A.2. Prove thatP.A1[A2/ D P.A1/CP.A2/�P.A1\A2/, for arbitraryevents A1;A2. Then prove more generally that for any finite n and arbitrary eventsfAkgnkD1, one has

P.[nkD1Ak/ D

X

1�i�nP.Ai / �

X

1�i<j�nP.Ai \ Aj /C

X

1�i<j<k�nP.Ai \ Aj \ Ak/ � � � � C .�1/n�1P.A1 \ A2 � � � \ An/:

This result is known as the principle of inclusion–exclusion.

Exercise A.3. Let .�;P / be a probability space and let R � 2 be an integer. ForA � �, recall that the complement Ac of A is defined by Ac D � � A. Prove thatif the events fAkgRkD1 are independent, then the complementary events fAckgRkD1 arealso independent. (Hint: By the definition of independence, we have

P.\`jD1Bj / D

Y

jD1P.Bj /; for any ` � R and any

sub-collection fBj g`jD1 of fAkgRkD1:

(A.5)

Using this, we need to prove that P.\`jD1Bc

j / D Q`jD1 P.Bc

j /, for any sub-

collection fBcj g`jD1 of fAckgRkD1. Let pj D P.Bj / and p D P.\`

jD1Bcj /. Then

we need to prove that p D Q`jD1.1 � pj /. Write

Page 8: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

140 A Primer on Discrete Probability

Y

jD1.1 � pj / D 1 �

X

1�i�`pi C

X

1�i<j�`pipj � � � � ;

and use (A.5) along with the principle of inclusion–exclusion, which appears inExercise A.2.

Exercise A.4. Using (A.4), show that

limn!1;p!0;np!�

P.Xn;p � x/ D P.X� � x/; for all x 2 R:

Page 9: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

Appendix BPower Series and Generating Functions

We review without proof some basic results concerning power series. For moredetails, the reader should consult an advanced calculus or undergraduate analysistext. We also illustrate the utility of generating functions by analyzing the one thatarises from the Fibonacci sequence.

Let fang1nD0 be a sequence of real numbers. Define formally the generating

function F.t/ of fang1nD0 by

F.t/ D1X

nD0ant

n; (B.1)

where t 2 R. We say “formally” because we have made the definition beforedetermining for which values of t the power series on the right hand side aboveconverges. The power series converges trivially for t D 0, and it is possible that itconverges only for t D 0, for example, if an D nŠ.

The power seriesP1

nD0 antn converges absolutely ifP1

nD0 jantnj < 1. Thepower series is uniformly, absolutely convergent for jt j � � if

limN!1 sup

jt j��

1X

nDNjantnj D 0I

that is, if the tail of the seriesP1

nD0 jantnj converges to 0 uniformly over jt j � �.We state four fundamental results concerning the convergence of power series:

1. If the power series converges for some number t0 ¤ 0, then necessarily the powerseries converges absolutely and uniformly for jt j � �, for all � < t0.

2. There exists an extended real number r0 2 Œ0;1� such that the power seriesP1nD0 antn converges absolutely if t 2 Œ0; r0/ and diverges if t > r0.

The number r0 in (2) is called the radius of convergence of the power series.

R.G. Pinsky, Problems from the Discrete to the Continuous, Universitext,DOI 10.1007/978-3-319-07965-3, © Springer International Publishing Switzerland 2014

141

Page 10: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

142 B Power Series and Generating Functions

3. The radius of convergence is given by the formula

r0 D 1

lim supn!1 npan:

4. If the power series is uniformly, absolutely convergent for jt j � �, then thefunction F.t/ in (B.1) is infinitely differentiable for jt j < �, and its derivativesare obtained via term by term differentiation in the power series; in particular,F 0.t/ D P1

nD0 nantn�1.

The generating function often provides an efficient method for obtaining infor-mation about the sequence fang1

nD0. Typically, this will occur when the generatingfunction can be written in a nice closed form and analyzed. This analysis then allowsone to obtain information about the coefficients in the generating function’s powerseries expansion, and these coefficients are of course fang1

nD0. We illustrate this inthe case of the famous Fibonacci sequence.

Recall that the sequence of Fibonacci numbers is defined recursively by f0 D 0;

f1 D 1 and

fn D fn�1 C fn�2; for n � 2: (B.2)

The first few Fibonacci numbers are 0,1,1,2,3,5,8,13, 21,34,55,89,144.We will obtain a closed form for the generating function

F.t/ D1X

nD0fnt

n (B.3)

of the Fibonacci numbers. Multiply both sides of (B.2) by tn and then sum bothsides over n, with n running from 2 to 1. This gives us

1X

nD2fnt

n D1X

nD2fn�1tn C

1X

nD2fn�2tn:

Since f0 D 0 and f1 D 1, the left hand side above is equal to F.t/ � t . Factoringout t from the first term and t 2 from the second term on the right hand side above,and using the fact that f0 D 0, one sees that the right hand side above is equal totF .t/C t 2F .t/. Thus, we obtain the equation

F.t/ � t D tF .t/C t 2F .t/;

which gives a closed form expression for F ; namely, F.t/ D t1�t�t2 . Up until now

we have ignored the question of convergence. However, the above formula gives us

the answer. The roots of the polynomial t 2 C t � 1 are rC WD �1Cp5

2and r� WD

�1�p5

2. Since jrCj < jr�j, we conclude that the generating function F.t/ has radius

Page 11: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

B Power Series and Generating Functions 143

of convergence jrCj Dp5�12

. Thus, the generating function of the Fibonacci seriesis given by

F.t/ D t

1 � t � t 2 ; jt j <p5 � 12

: (B.4)

We now use the method of partial fractions to represent the function t1�t�t2 in an

explicit power series. Using the fact that rCr� D �1, we write

t 2 C t � 1 D .t � rC/.t � r�/ D �.t r� C 1/.t rC C 1/I

thus,

t

1 � t � t 2 D t

.t r� C 1/.t rC C 1/: (B.5)

For unknown A and B , we write

t

.t r� C 1/.t rC C 1/D A

tr� C 1C B

trC C 1D t .ArC C Br�/C .AC B/

.t r� C 1/.t rC C 1/:

(B.6)

Comparing the left-most and right-most terms in (B.6) , we conclude thatACB D 0

and ArC C Br� D 1. Solving for A and B , we obtain A D 1

rC�r� D 1p5

and

B D 1

r��rC D � 1p5. Thus, from (B.5) and the first equality in (B.6), we arrive at

the partial fraction representation

t

1 � t � t 2 D 1p5

� 1

1C t r� � 1

1C t rC�: (B.7)

Since jr�j > jrCj, both 11Ct r� and 1

1Ct rC can be written as geometric series if

jt j < 1jr�j D 2

1Cp5

Dp5�12

. We have

1

1C t r� D1X

nD0.�1/n.r�/ntn D

1X

nD0.1C p

5

2/ntnI

1

1C t rC D1X

nD0.�1/n.rC/ntn D

1X

nD0.1 � p

5

2/ntn:

(B.8)

Thus, from (B.4), (B.7), and (B.8), we obtain

F.t/ D1X

nD0

1p5

�.1C p

5

2/n � .1 � p

5

2/n�tn: (B.9)

Page 12: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

144 B Power Series and Generating Functions

Comparing (B.3) with (B.9), we conclude that the nth Fibonacci number fn isgiven explicitly by

fn D 1p5

�.1C p

5

2/n � .1 � p

5

2/n�: (B.10)

From the explicit formula in (B.10), the asymptotic behavior of fn is clear:

fn � 1p5.1C p

5

2/n as n ! 1:

Page 13: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

Appendix CA Proof of Stirling’s Formula

Stirling’s formula states that

nŠ � nne�np2n; as n ! 1: (C.1)

In order to obtain an asymptotic formula for the discrete quantity nŠ, it is extremelyuseful to be able to embed this quantity in a function of a continuous variable.Integrating by parts and then applying induction shows that nŠ D .nC 1/, n 2 N,where the gamma function .t/ is defined by

.t/ DZ 1

0

xt�1e�x dx; t > 0:

Thus, one proves Stirling’s formula in the following form.

Theorem C.1 (Stirling’s Formula).

.t C 1/ � t t e�tp2t; as t ! 1: (C.2)

Proof. In the literature one can find literally dozens of proofs of Stirling’s formula.We present here an elementary proof that uses Laplace’s asymptotic method [14].We begin by giving the intuition for the method. We write

.t C 1/ DZ 1

0

e t .x/ dx; (C.3)

where

t.x/ D t log x � x:

R.G. Pinsky, Problems from the Discrete to the Continuous, Universitext,DOI 10.1007/978-3-319-07965-3, © Springer International Publishing Switzerland 2014

145

Page 14: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

146 C Proof of Stirling’s Formula

Now t takes on its maximum at x D t , and the Taylor expansion of t about x D t

starts out as

t log t � t � .x � t /22t

DW O t.x/:

Replacing t by O t , we calculate that

Z 1

0

eO t .x/ dx D

Z 1

0

et log t�t� .x�t /2

2t dx D t t e�tZ 1

0

e� .x�t /2

2t dx:

Making the substitution z D x�tpt

gives

Z 1

0

e� .x�t /2

2t dx D pt

Z 1

�pt

e� 12 z2 d z:

SinceR1

�1 e� 12 z2 d z D p

2 , we conclude that

Z 1

0

eO t .x/ dx � t t e�tp2t; as t ! 1:

We now turn to the rigorous proof. We can write t exactly as

t.t C y/ D t log t � t � tg.yt/;

where

g.v/ D v � log.1C v/:

Substituting this in (C.3) and making the change of variables x D y C t , we obtain

.t C 1/ D t t e�tZ 1

�te�tg. yt / dy:

Making the change of variables y D ptz, we have

.t C 1/ D t t e�tp2t N.t/; (C.4)

where

N.t/ D 1p2

Z 1

�pt

e�tg. zp

t/d z:

Page 15: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

C Proof of Stirling’s Formula 147

We will show that

limt!1

N.t/ D 1: (C.5)

Now (C.2) follows from (C.4) and (C.5).Fix L > 0 and write

N.t/ D NL.t/C 1p2

TCL .t/C 1p

2T �L .t/; (C.6)

where

NL.t/ D 1p2

Z L

�Le

�tg. zpt/d z

and

TCL .t/ D

Z 1

L

e�tg. zp

t/d z; T �

L .t/ DZ �L

�pt

e�tg. zp

t/d z:

From Taylor’s remainder formula it follows that for any � > 0 and sufficiently smallv, one has

1

2.1 � �/v2 � g.v/ � 1

2.1C �/v2:

Thus, limt!1 tg. zpt/ D 1

2z2, uniformly over z 2 Œ�L;L�; consequently,

limt!1

NL.t/ D 1p2

Z L

�Le� 1

2 z2 d z: (C.7)

Since t�g. zp

t/�0 D p

t�1 � 1

1C zpt

� DptzptCz

is increasing in z, we have

TCL .t/ �

pt C LptL

Z 1

L

t�g.

zpt/�0e

�tg. zpt/d z D

pt C LptL

e�tg. Lp

t/ D

pt C LptL

e�t Œ Lp

t�log.1C Lp

t/�:

By Taylor’s formula, we have log.1C Lpt/ D Lp

t� L2

2tCO.t� 3

2 / as t ! 1; thus,

lim supt!1

TCL .t/ � 1

Le� 1

2 L2

: (C.8)

Page 16: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

148 C Proof of Stirling’s Formula

A very similar argument gives

lim supt!1

T �L .t/ � 1

Le� 1

2 L2

: (C.9)

Now from (C.6)–(C.9), we obtain

1p2

Z L

�Le� 1

2 z2 d z � lim inft!1

N.t/ � lim supt!1

N.t/

� 1p2

Z L

�Le� 1

2 z2 d z C 2

Lp2e� 1

2 L2

:

Since N.t/ is independent of L, letting L ! 1 above gives (C.5). �

Page 17: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

Appendix D

An Elementary Proof ofP1

nD11

n2 D �2

6

The standard way to prove the identity in the title of this appendix is via Fourierseries. We give a completely elementary proof, following [1]. Consider the doubleintegral

I DZ 1

0

Z 1

0

1

1 � xy dxdy: (D.1)

(Actually, the expression on the right hand side of (D.1) is an improper integral,because the integrand blows up at .x; y/ D .1; 1/. Thus,

R 10

R 10

11�xy dxdy WD

lim�!0C

R 1��0

R 1��0

11�xy dxdy. Since the integrand is nonnegative, there is no

problem applying the standard rules of calculus directly toR 10

R 10

11�xy dxdy.) On

the one hand, expanding the integrand in a geometric series and integrating term byterm gives

I DZ 1

0

Z 1

0

1X

nD0.xy/n dxdy D

1X

nD0

Z 1

0

Z 1

0

xnyn dxdy D

1X

nD0

� Z 1

0

xn dx�� Z 1

0

yn dy�

D1X

nD0

1

.nC 1/2D

1X

nD1

1

n2: (D.2)

(The interchanging of the order of the integration and the summation is justified bythe fact that all the summands are nonnegative.)

On the other hand, consider the change of variables u D yCx2

, v D y�x2

. Thistransformation rotates the square Œ0; 1��Œ0; 1� clockwise by 45ı and shrinks its sidesby the factor

p2. The new domain is f.u; v/ W 0 � u � 1

2;�u � v � ug [ f.u; v/ W

12

� u � 1; u �1 � v � 1� ug. The Jacobian @.x;y/

@.u;v/ of the transformation is equal to

2, so the area element dxdy gets replaced by 2dudv. The function 11�xy becomes

11�u2Cv2 . Since the function and the domain are symmetric with respect to the u-axis,we have

R.G. Pinsky, Problems from the Discrete to the Continuous, Universitext,DOI 10.1007/978-3-319-07965-3, © Springer International Publishing Switzerland 2014

149

Page 18: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

150 D Proof ofP1

nD11n2

D 2

6

I D 4

Z 12

0

�Z u

0

dv

1 � u2 C v2

�du C 4

Z 1

12

�Z 1�u

0

dv

1 � u2 C v2

�du:

Using the integration formulaR

dxx2Ca2 D 1

aarctan x

a, we obtain

I D 4

Z 12

0

1p1 � u2

arctan� up

1 � u2

�du C 4

Z 1

12

1p1 � u2

arctan� 1 � up

1 � u2

�du:

Now the derivative of g.u/ WD arctan�

up1�u2

�is 1p

1�u2, and the derivative of

h.u/ WD arctan�

1�up1�u2

� D arctan�q

1�u1Cu

�is � 1

21p1�u2

. Thus, we conclude that

I D 4

Z 12

0

g.u/g0.u/ du � 8Z 1

12

h.u/h0.u/ du D 2g2.u/j 120 � 4h2.u/j112

D

2�

arctan21p3

� arctan2 0� � 4� arctan2 0 � arctan2

1p3

� D 6 arctan21p3

D 6.

6/2 D 2

6: (D.3)

Comparing (D.2) and (D.3) gives

1X

nD1

1

n2D 2

6:

Page 19: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

References

1. Aigner, M., Ziegler, G.: Proofs from the Book, 4th edn. Springer, Berlin (2010)2. Alon, N., Spencer, J.: The Probabilistic Method, 3rd edn. Wiley-Interscience Series in Discrete

Mathematics and Optimization. Wiley, Hoboken (2008)3. Alon, N., Krivelevich, M., Sudakov, B.: Finding a large hidden clique in a random graph.

In: Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms (SanFrancisco, CA, 1998), pp. 594–598. ACM, New York (1998)

4. Andrews, G.: The Theory of Partitions, reprint of the 1976 original. Cambridge UniversityPress, Cambridge (1998)

5. Apostol, T.: Introduction to Analytic Number Theory. Undergraduate Texts in Mathematics.Springer, New York (1976)

6. Arratia, R., Barbour, A.D., Tavaré, S.: Logarithmic Combinatorial Structures: A ProbabilisticApproach. EMS Monographs in Mathematics. European Mathematical Society, Zürich (2003)

7. Athreya, K., Ney, P.: Branching Processes, reprint of the 1963 original [Springer, Berlin].Dover Publications, Inc., Mineola (2004)

8. Bollobás, B.: The evolution of random graphs. Trans. Am. Math. Soc. 286, 257–274 (1984)9. Bollobás, B.: Modern Graph Theory. Graduate Texts in Mathematics, vol. 184. Springer, New

York (1998)10. Bollobás, B.: Random Graphs, 2nd edn. Cambridge Studies in Advanced Mathematics, vol. 73.

Cambridge University Press, Cambridge (2001)11. Brauer, A.: On a problem of partitions. Am. J. Math. 64, 299–312 (1942)12. Conlon, D.: A new upper bound for diagonal Ramsey numbers. Ann. Math. 170, 941–960

(2009)13. Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications, 2nd edn. Springer,

New York (1998)14. Diaconis, P., Freedman, D.: An elementary proof of Stirling’s formula. Am. Math. Mon. 93,

123–125 (1986)15. Doyle, P., Snell, J.L.: Random Walks and Electric Networks. Carus Mathematical Monographs,

vol. 22. Mathematical Association of America, Washington (1984)16. Durrett, R.: Probability: Theory and Examples, 4th edn. Cambridge Series in Statistical and

Probabilistic Mathematics. Cambridge University Press, Cambridge (2010)17. Dwass, M.: The number of increases in a random permutation. J. Combin. Theor. Ser. A 15,

192–199 (1973)18. Erdos, P., Rényi, A.: On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutató Int.

Közl 5, 17–61 (1960)19. Feller, W.: An Introduction to Probability Theory and Its Applications, 3rd edn, vol. I. Wiley,

New York (1968)

R.G. Pinsky, Problems from the Discrete to the Continuous, Universitext,DOI 10.1007/978-3-319-07965-3, © Springer International Publishing Switzerland 2014

151

Page 20: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

152 References

20. Flajolet, P., Sedgewick, R.: Analytic Combinatorics. Cambridge University Press, Cambridge(2009)

21. Flory, P.J.: Intramolecular reaction between neighboring substituents of vinyl polymers. J. Am.Chem. Soc. 61, 1518–1521 (1939)

22. Graham, R., Rothschild, B., Spencer, J.: Ramsey Theory, 2nd edn. Wiley-Interscience Seriesin Discrete Mathematics and Optimization. Wiley, New York (1990)

23. Hardy, G.H., Ramanujan, S.: Asymptotic formulae in combinatory analysis. Proc. LondonMath. Soc. 17, 75–115 (1918)

24. Harris, T.: The Theory of Branching Processes, corrected reprint of the 1963 original [Springer,Berlin]. Dover Publications, Inc., Mineola (2002)

25. Jameson, G.J.O.: The Prime Number Theorem. London Mathematical Society Student Texts,vol. 53. Cambridge University Press, Cambridge (2003)

26. Montgomery, H., Vaughan, R.: Multiplicative Number Theory. I. Classical Theory. CambridgeStudies in Advanced Mathematics, vol. 97. Cambridge University Press, Cambridge (2007)

27. Nathanson, M.: Elementary Methods in Number Theory. Graduate Texts in Mathematics, vol.195. Springer, New York (2000)

28. Page, E.S.: The distribution of vacancies on a line. J. Roy. Stat. Soc. Ser. B 21, 364–374 (1959)29. Pinsky, R.: Detecting tampering in a random hypercube. Electron. J. Probab. 18, 1–12 (2013)30. Pitman, J.: Combinatorial stochastic processes. Lectures from the 32nd Summer School on

Probability Theory held in Saint-Flour, 7–24 July 2002. Lecture Notes in Mathematics, 1875.Springer, Berlin (2006)

31. Rényi, A.: On a one-dimensional problem concerning random space filling (Hungarian;English summary). Magyar Tud. Akad. Mat. Kutató Int. Közl. 3, 109–127 (1958)

32. Spitzer, F.: Principles of Random Walk, 2nd edn. Graduate Texts in Mathematics, vol. 34.Springer, New York (1976)

33. Tenenbaum, G.: Introduction to Analytic and Probabilistic Number Theory. Cambridge Studiesin Advanced Mathematics, vol. 46. Cambridge University Press, Cambridge (1995)

34. Wilf, H.: Generating Functionology, 3rd edn. A K Peters, Ltd., Wellesley (2006)

Page 21: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

Index

AAbel summation, 77arcsine distribution, 37average order, 13

BBernoulli random variable, 138binomial random variable, 138branching process – see Galton–Watson

branching process, 117

CChebyshev’s -function, 70Chebyshev’s � -function, 68Chebyshev’s inequality, 134Chebyshev’s theorem, 68Chinese remainder theorem, 19clique, 89coloring of a graph, 104composition of an integer, 5cycle index, 58cycle type, 51

Dderangement, 49Dyck path, 40

EErdos–Rényi graph, 89Euler �-function, 11Euler product formula, 19Ewens sampling formula, 52expected value, 134extinction, 117

FFibonacci sequence, 142finite graph, 89

GGalton–Watson branching process, 117generating function, 141giant component, 110

HHardy–Ramanujan theorem, 81

Iindependent events, 133independent random variables, 135

Llarge deviations, 113

MMertens’ theorems, 75Mobius function, 8Mobius inversion, 10multiplicative function, 9

Pp-adic, 71partition of an integer, 1Poisson approximation to the binomial

distribution, 138Poisson random variable, 138prime number theorem, 67

R.G. Pinsky, Problems from the Discrete to the Continuous, Universitext,DOI 10.1007/978-3-319-07965-3, © Springer International Publishing Switzerland 2014

153

Page 22: Appendix A A Quick Primer on Discrete Probability978-3-319-07965-3/1.pdf · Appendix A A Quick Primer on Discrete Probability In this appendix, we develop some basic ideas in discrete

154 Index

probabilistic method, 107probability generating function, 54probability space, 133

RRamsey number, 105random variable, 133relative entropy, 115, 131restricted partition of an integer, 1

Ssieve method, 19simple, symmetric random walk, 35square-free integer, 8Stirling numbers of the first kind, 54

Stirling’s formula, 145survival, 117

Ttampering detection, 99total variation distance, 99

Vvariance, 134

Wweak convergence, 139weak law of large numbers, 136, 137


Recommended