Probability and Random Processes -...

Probability and Random Processes


Jinho ChoiGIST

February 2017

1 / 99


What Albert Einstein said:

I “As I have said so many times, God doesn’t play dice with theworld.”

I “Two things are infinite: the universe and human stupidity.”

So, he does believe in infinity,but not in randomness.

2 / 99


Main aims

I Understand fundamental issues of probability theory (pdf,mean, and variance)

I Understand joint and conditional pdf, independence, andcorrelation

I Learn properties of Gaussian random variables and be able toderive Chernoff bound

I Understand random processes with key definitions

I Be able to compute the mean and variance of samples fromrandom processes

3 / 99


Probability Theory

Probability Theory

The three axioms with a sample space, Ω, a family of sets, F , forallowable events, and measure Pr(·):

I The first axiom: The probability of an event, E, is anon-negative real number:

Pr(E) ≥ 0.

I The second axiom:Pr(Ω) = 1.

I The third axiom: For any countable sequence of mutuallyexclusive events E1, E2, . . .,

Pr(E1 ∪ E2 ∪ · · · ) =

∞∑

i=1

Pr(Ei).

4 / 99


Probability Theory

Random variables: A random variable is a mapping from thesample space Ω to the set of real numbers.

Sample space (abstract space) ⇒ Real number

The main idea of random variables is to describe some randomevents by numbers.

real numbers

Event space Ω

event ω X( )ω

X: random variable:

ω ∈ Ω→ X(ω) ∈ (−∞,∞)

5 / 99


Probability Theory

Example of random variables: A game of dice

I Before you draw a dice, the number of dots is unknown.⇒ This number can be considered as a random variable.I Once a dice is drawn, we have a particular number, which

would be one of 1, . . . , 6. This is called a “realization.”

realizations of 2, 3, 46 / 99


Probability Theory

Cumulative distribution function (CDF):

FX(x) = Pr(X ≤ x) ,

where X is the random variable (r.v.) and x is a real number. Bydefinition, the CDF is a nondecreasing function.Example: a dice

Cummulative distribution function (CDF):

FX(x) = Pr(X ! x),

where X is the random variable (r.v.) and x is a real number.

Probability density function (PDF):

fX(x) =d

dxFX(x)

Example: a dice

1 4 5 62 3

2/6

3/6

4/6

5/6

1

1/6

FX(x) = Pr(X ! x)

x

3

7 / 99


Probability Theory

There are different types of r.v.’s such as:

I continuous r.v.: X has a continuous value

I discrete r.v.: X has a discrete value

Examples:

I the phase of a sinusoid, θ: sin(ωct+ θ)⇒ continuous r.v.

I the number of dots of a dice⇒ discrete r.v.

8 / 99


Probability Theory

A discrete r.v. with binomial distribution

I Consider a random experiement that has two possibleoutcomes. For example, the outcome of this experiment canbe expressed (1 to denote success; 0 to denote failure) as

Y =

1, with probability p;0, with probability 1− p.

I Consider a sum of n outcomes from independent experiments:

X =

n∑

j=1

Yj ∈ Ω = 0, 1 . . . , n.

I Then, X is the binomial random variable with parameters nand p, X ∼ B(n, p):

Pr(X = j) =

(n

j

)pj(1− p)n−j , j = 0, . . . , n.

9 / 99


Probability Theory

Continuous r.v., X, has the probability density function (pdf) as

fX(x) =d

dxFX(x) .

I As FX(x) is nondecreasing, fX(x) ≥ 0.

I In general,

∫ t

−∞fX(x)dx = FX(t) .

I Since limx→∞ FX(x) = 1,∫∞−∞ fX(x)dx = 1.

For a discrete r.v., the pdf becomes the probability massfunction (pmf) which is actually probability.

I Example of dice:

fX(X = k)→ Pr(X = k) =1

6, k = 1, 2, . . . , 6.

10 / 99


Probability Theory

Mean and VarianceFor a r.v. X, the mean of X (or g(X), where g(x) is a function ofx) is given by

I a continuous r.v.:

E[X] =

∫xfX(x)dx or E[g(X)] =

∫g(x)fX(x)dx

I a discrete r.v.:

E[X] =∑

k

xk Pr(xk) or E[g(X)] =∑

k

g(xk) Pr(xk)

The variance is given by

V ar(X) = E[(X − E[X])2]

The mean and variance are used to characterize a random variable.11 / 99


Probability Theory

Mean of X ∼ B(n, p)

E[X] =

n∑

j=0

j

(n

j

)pj(1− p)n−j

=

n∑

j=1

jn!

j!(n− j)!pj(1− p)n−j

=

n∑

j=1

n(n− 1)!

(j − 1)!(n− j)!ppj−1(1− p)n−j

= np

n∑

j=1

(n− 1)!

(j − 1)!(n− j)!pj−1(1− p)n−j

= np(p+ (1− p))n−1 = np.

12 / 99


Probability Theory

Geometric random variable:

I pmf: Pr(X = k) = (1− p)k−1p, k ≥ 1Note that

∞∑

k=1

(1− p)k−1p = p

∞∑

k=0

(1− p)k = p1

1− (1− p)= 1

I Example: The number of independent flips of a coin untilhead first apprears.

I Mean: Letting q = 1− p, it can be shown that

E[X] =

∞∑

k=1

k(1− p)k−1p = pd

dq

∞∑

k=0

qk

= pd

dq

1

1− q= p

1

(1− q)2=

1

p

13 / 99


Probability Theory

A continuous r.v. with uniform distributionLet us consider a uniform r.v. X that has the pdf as

fX(x) =

1A , 0 ≤ x ≤ A;0, otherwise

The mean is

E[X] =

∫ A

0x

1

Adx =

1

A

x2

2

∣∣∣∣A

x=0

=A

2.

The variance is

E[(X − E[X])2] =

∫ A

0

(x− A

2

)2 1

Adx

=

∫ A/2

−A/2z2 1

Adz

=1

A× 1

3z3

∣∣∣∣A/2

−A/2=A2

1214 / 99


Probability Theory

More examples on expectationQ) Let X be a r.v. and a and c are constants. Show that

E[aX] = aE[X] and E[X + c] = E[X] + c

A)

E[aX] =

∫(a x)fX(x)dx = a

∫xfX(x)dx = aE[X]

E[X + c] =

∫(x+ c)fX(x)dx =

∫xfX(x)dx+

∫cfX(x)dx = E[X] + c

Q) Show that E[X2] = V ar(X) + (E[X])2

A)

E[(X − E[X])2] = E[X2 − 2(E[X])X + (E[X])2]= E[X2]− 2(E[X])E[X] + (E[X])2

= E[X2]− (E[X])2

15 / 99


Probability Theory

Q) Show that V ar(aX + c) = a2V ar(X)

Q) Suppose that X is a r.v. with mean 1 and variance 3. FindE[3X2 + 2X].A)

E[3X2 + 2X] = 3E[X2] + 2E[X]= 3(V ar(X) + E2[X]) + 2E[X]

= 3× (3 + 12) + 2× 1 = 14.

16 / 99


Probability Theory

Jensen’s inequality:For a convex function, g(x),

E[g(X)] ≥ g(E[X]) .

A convex function satisfiesλg(x1) + (1− λ)g(x2) ≥ g(λx1 + (1− λ)x2), λ ∈ [0, 1]

17 / 99


Probability Theory

Gaussian or normal random variable:

fX(x) = N (µ, σ2) =1√2πσ

exp

(−(x− µ)2

2σ2

),

where the mean

E[X] =

∫ ∞

−∞xfX(x)dx = µ

and the variance

E[(X − E[X])2] =

∫ ∞

−∞(x− µ)2fX(x)dx = σ2

18 / 99


Probability Theory

Normal or Gaussian pdfs:

19 / 99


Probability Theory

Normal or Gaussian cdfs:

20 / 99


Probability Theory

Q-function: a tail of the normal pdf.

Pr(X ≥ x) = Q(x)4=

∫ ∞

x

1√2π

exp

(− t

2

2

)dt

21 / 99


Probability Theory

Conditional probabilityThe conditional probability of an event A given B is denoted andgiven by

Pr(A |B) =Pr(A,B)

Pr(B)

Q) Find the probability that the face of one dot occurs assumingodd dots are observed:A) We have

Pr(odd) =1

2and

Pr(1, odd) = Pr(1) =1

6.

Hence, it follows

Pr(1| odd) =Pr(1, odd)

Pr(odd)=

1

3.

22 / 99


Probability Theory

Multiple random variables: The joint pdf is written as

fXY (x, y) =∂2

∂x∂yFXY (x, y) =

∂2

∂x∂yPr(X ≤ x, Y ≤ y)

Conditional pdf:

fX|Y (x|y) =

fXY (x,y)fY (y) , if fY (y) 6= 0.

0, otherwise.

Marginalization:

fX(x) =

∫fXY (x, y)dy or fY (y) =

∫fXY (x, y)dx

23 / 99


Probability Theory

Expectation with two r.v.’s:For continuous r.v.’s, double-integral should take place:

E[g(X,Y )] =

∫ ∫g(x, y)fXY (x, y)dxdy (continuous)

E[g(X,Y )] =∑

x

∑

y

g(x, y) Pr(X = x, Y = y) (discrete).

The conditional expectation is defined by

E[X|Y ] =

∫xfX|Y (x|Y )dx (continuous)

E[X|Y ] =∑

x

xPr(X = x |Y ) (discrete).

Note that E[X|Y ] = g(Y ) is a function of Y , which is a randomvariable.

24 / 99


Probability Theory

Q) Show thatE[XY ] = E[Y × E[X|Y ]]

A)

E[XY ] =

∫ ∫xyfXY (x, y)dxdy

=

∫ ∫xyfX|Y (x|y)fY (y)dxdy

=

∫y

(∫xfX|Y (x|y)dx

)fY (y)dy

=

∫yE[X|y]fY (y)dy = E[Y E[X|Y ]].

25 / 99


Probability Theory

Exponential distribution and memorylessness

I The exponential distribution is given by

f(x;λ) =

λe−λx, if x ≥ 0;0, o.w.

I The mean and variance are 1/λ and 1/λ2, respectively.

I An exponentially distributed random variable T obeys thefollowing relation:

Pr(T > s+ t |T > s) = Pr(T > t), s, t ≥ 0.

This property is called the memorylessness.

26 / 99


Probability Theory

Independence and correlationThe joint cdf (more than 2 r.v.’s) is written as

FX1,X2,··· ,Xn(x1, x2, · · · , xn) = Pr(X1 ≤ x1, · · · , Xn ≤ xn)

The joint pdf is given by

fX1,X2,··· ,Xn(x1, x2, · · · , xn) =∂n

∂x1 · · · ∂xnFX1,X2,··· ,Xn(x1, x2, · · · , xn).

The marginal pdf is given by

fX1(x1) =

∫

x2

· · ·∫

xn

fX1,X2,··· ,Xn(x1, x2, · · · , xn)dx2 · · · dxn.

If X1, · · · , Xn are independent, then

FX1,X2,··· ,Xn(x1, x2, · · · , xn) = FX1(x1) · · ·FXn(xn)

and

fX1,X2,··· ,Xn(x1, x2, · · · , xn) = fX1(x1) · · · fXn(xn) .27 / 99


Probability Theory

I Note: If X and Y are independent,

fX|Y (x|y) =fXY (x, y)

fY (y)= fX(x).

I The correlation of X and Y is defined as

corr(X,Y ) = E[XY ]

and the covariance is defined as

cov(X,Y ) = E[(X − E[X])(Y − E[Y ])].

If cov(X,Y ) = 0, X and Y are said to be uncorrelated (notuncovarianced).

28 / 99


Probability Theory

Joint Gaussian random variables: Let us define the randomvector as

x = [X1 X2 · · · Xn]T.

The random variables, Xi, are jointly Gaussian if the pdf can bewritten as

f(x1, x2, · · · , xn) =1√

2π det(C)exp

(−1

2(x−m)TC−1(x−m)

),

where the mean vector is m = E[x] = [E[X1] . . . E[Xn]]T and thecovariance matrix is

C = E[(X−m)(X−m)T]

=

E[X1X1]− E[X1]E[X1] . . . E[X1Xn]− E[X1]E[Xn]...

. . ....

E[XnX1]− E[Xn]E[X1] . . . E[XnXn]− E[Xn]E[Xn]

.

29 / 99


Probability Theory

30 / 99


Probability Theory

Moment Generating Function (MGF)

I Finding mn = E[Xn] (the nth moment)

I Define the MGF as

MX(t) = E[etX ].

I Using the Taylor series,

MX(t) = E

[ ∞∑

k=0

1

k!(tX)k

]=

∞∑

k=0

tkmk

k!

I From this, the kth moment can be found as

mk =dk

dtkMX(t)

∣∣∣∣t=0

.

31 / 99


Probability Theory

I The MGF can be seen as the Laplace transform of the pdf,fX(t):

MX(t) =

∫etXfX(x)dx

Thus, using the inverse Laplace transform, the pdf can befound from the MGF.

I Let X and Y be independent random variables. The MGF ofZ = X + Y is given by

MZ(t) = E[etZ ] = E[et(X+Y )] = MX(t)MY (t).

Thus, the pdf of Z is the inverse Laplace transform ofMX(t)MY (t).

I Thus, the pdf of Z is the convolution of the pdfs of X and Y :fZ(x) = fX(x) ∗ fY (x).

32 / 99


Probability Theory

Sum of independent Gaussian random variables: Let Xi beindependent Gaussian r.v.’s, Xi ∼ N (µi, σ

2i ). Then,

Y =∑

i

Xi

is also a Gaussian r.v.

I The MGF of Xi is Mi(t) = exp(µit+ 12σ

2i t

2).

I The MGF of Y is

MY (t) =∏

i

Mi(t) = exp

(∑

i

µit+1

2σ2i t

2

)

⇒Y ∼ N

(∑

i

µi,∑

i

σ2i

).

33 / 99


Probability Theory

Gaussian related distributions

I χ2 distribution: The pdf of the χ2 distribution with N degreesof freedom is written as

fX(x) =1

Γ(N/2)2N/2x(N/2)−1e−x/2 x ≥ 0,

where Γ(x) =∫∞

0 tx−1e−tdt is the Gamma function. If x = nis an integer,

Γ(n) = (n− 1)!.

Let X1, X2, · · · , XN are identically independentlly distributed(i.i.d.) random variables (r.v.’s). In addition, letXi ∼ N (0, 1). Then, Y =

∑Ni=1X

2i is the χ2 r.v. with N

degrees of freedom.

34 / 99


Probability Theory

χ2 pdf

35 / 99


Probability Theory

I The χ2 distribution with 2 degrees of freedom is anexponential distribution:

fX(x) =1

2e−

x2 , x ≥ 0.

I The exponential distribution with parameter λ (denoted byX ∼ Exp(λ)) is given by

f(x;λ) =

λe−λx, x ≥ 0,

0, x < 0.

I The mean and variance of X ∼ Exp(λ) are

E[X] =1

λ

V ar(X) =1

λ2.

36 / 99


Probability Theory

I Let X1 and X2 be i.i.d. Gaussian r.v. and Xi ∼ N (0, 1).Define Y = X1 + jX2. Then, Y is the circularly symmetriccomplex Gaussian (CSCG) r.v. The mean and variance are

E[Y ] = 0,E[Y 2] = 0, and E[|Y |2] = 2.

I From a zero-mean CSCG random vector to a real-valuedGaussian random vector:

y = x1+jx2 ⇒[x1

x2

]∼ N

([00

],

[E[x1x

T1 ] E[x1x

T2 ]

E[x2xT1 ] E[x2x

T2 ]

])

I Then, what is the distribution of z = Ay if y is a zero-meanCSCG random vector to a real-valued Gaussian?

37 / 99


Probability Theory

I The real-valued A:

A→ A =

[<(A) −=(A)=(A) <(A)

]

I Noting that

z→[<(z)=(z)

]= A

[<(y)=(y)

]= A

[x1

x2

],

it can be shown that[<(z)=(z)

]∼ N

([00

], A

[E[x1x

T1 ] E[x1x

T2 ]

E[x2xT1 ] E[x2x

T2 ]

]AT

).

38 / 99


Probability Theory

Remarks:

I Let X be a CSCG r.v. Then, |X|2 becomes a χ2 r.v.

I The Gamma function Γ(n):

Γ(n) =

∫ ∞

0tn−1e−tdt, n > 0.

Some properties:Γ(n+ 1) = nΓ(n)

andΓ(n+ 1) = n!

In addition,

Γ

(1

2

)=√π.

39 / 99


Probability Theory

Rician and Rayleigh random variables

I The Rayleigh r.v. has a single parameter pdf given by

fX(x) =x

σ2e−x

2/2σ2, x ≥ 0

and the cdf is

FX(x) = 1− e−x2/2σ2, x ≥ 0.

If X1 and X2 are independent Gaussian random vairables withmean 0 and variance σ2, then X =

√X2

1 +X22 is the

Rayleigh r.v.

40 / 99


Probability Theory

I The Rician pdf is written as

fX(x) =x

σ2I0

(µxσ2

)e(−x2+µ2)/2σ2

, x ≥ 0,

where I0(x) is the modified Bessel function of zero order. Itcan be shown that if X1 and X2 are independent Gaussian,and X1 ∼ N (µ, σ2) and X2 ∼ N (0, σ2), thenX =

√X2

1 +X22 is the Rician r.v.

41 / 99


Probability Theory

Approximations from binomial to Poisson, to Gaussian

I A (discrete) Poisson random variable with parameter λ,denoted by Pois(λ), has the following distribution:

Pr(X = k) =e−λλk

k!, k = 0, 1, . . .

The followings are often useful:

I A binomial random variable ∼ B(n, p) can be approximatedby a Poisson random variable with λ = np if n is sufficientlylarge.

I A Poisson random variable with λ can be approximated by aGaussian random variable if λ is large.

42 / 99


Probability Theory

Limit of X ∼ B(n, p): (taking a heuristic approach)

Pr(X = k) =

(n

k

)pk(1− p)n−k

≈ nk

k!pk

(1− p)n

(1− p)k(for large n)

≈ (np)k

k!

e−pn

e−pk(use (1− x) ≈ e−x, |x| 1)

Let λ = pn and taking n→∞, which means that p approaches 0(e−pk → 1). Then,

Pr(X = k)→ λke−λ

k!,

which is the Poisson distribution with parameter λ.

43 / 99


Probability Theory

Sum of independent Poisson rvs is also Poisson

I The MGF of Pois(λ):

MX(t) = E[etX] =

∞∑

k=0

etkλke−λ

k!

= e−λ∞∑

k=0

(etλ)k

k!= e−λee

tλ = exp(λ(et − 1)).

I Z = X + Y , where X ∼ Pois(λX) and Y ∼ Pois(λY ), has theMGF:

MZ(t) = MX(t)MY (t) = eλX(et−1)eλY (et−1) = e(λX+λY )(et−1),

which leads to Z ∼ Pois(λX + λY ).

44 / 99


Probability Theory

Standardized Poisson random variable approaches Gaussian

I Let X ∼ Pois(λ).I E[X] = λ and V ar(X) = λ (check these!)I The standardized Poisson rv becomes

X − E[X]√V ar(X)

=X − λ√

λ(→ Z ∼ N (0, 1) as λ→∞) .

I The MGF is

E[etX−λ√

λ ] = E[et√λX

]e−t√λ = exp(λ(e

t√λ − 1))e−t

√λ

= exp

(λ

(t√λ

+t2

2!λ+

t3

3!λ32

+ . . .

)− t√λ

)

= exp

(t

2λ

)as λ→∞,

which is the MGF of Z ∼ N (0, 1).45 / 99


Probability Theory

Transformation Method

I Suppose that U is uniformly distributed in the interval [0, 1].

I From U , we may want to generate r.v., X, that have cdfFX(x).

I To this end, defineZ = F−1

X (U).

I Then, since Pr(u < t) = t, t ∈ [0, 1],

Pr(Z ≤ x) = Pr(F−1X (u) ≤ x) = Pr(u ≤ FX(x)) = FX(x).

I That is, Z becomes a r.v. X of cdf FX(x).

46 / 99


Probability Theory

Example: Consider a circle of radius R. We want to generate r.v.that are uniformly distributed over the area of this circle.

I Using the polar coordination, let the r.v. uniformly distributedover the circle, denoted by Y , be

Y = Xejθ,

where θ ∼ U(0, 2π) and X ∼ fX(x) = 2xR2 , 0 ≤ x ≤ R.

I Since FX(x) = x2

R2 ,

X = F−1X (U) = R

√U.

I MATLAB script:>> theta = (2*pi).*rand(N,1);

>> X = R.*sqrt(rand(N,1));

47 / 99


Probability Theory

R = 2, 1000 samples from uniform distribution

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

48 / 99


Probability Theory

Monte Carlo Method

I The Monte Carlo method is a set of approaches to performparameter estimation by simulating probability models.

I It is useful if analytical means (i.e., pencil and paper methods)is not available.

I It can be used for fairly complex cases such as communicationsystems and networks.

I In the Monte Carlo method, the generation of r.v.’s are veryimportant.

49 / 99


Probability Theory

R = 1, N = 1000 samples from uniform distribution over a square([0, 1]× [0, 1])

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

π ≈ M/NR2/4

, where M is the number of the samples whose length is

less than or equal to R among N .50 / 99


Probability Theory

Central Limit Theorem (CLT)

I Let Xi be a sequence of independent and identicallydistributed (iid) random variables with E[Xi] = µ andV ar(Xi) = σ2 <∞.

I Then,√n

((1

n

n∑

i=1

Xi

)− µ

)d−→ N (0, σ2),

the distribution of√n((

1n

∑ni=1Xi

)− µ

)converges to the

zero-mean Gaussian distribution with the same variance, σ2.

I Roughly, it says that a sum of independent random variablesconverges to a Gaussian random variable.

I There are variations of CLT where independent and identicalconditions are relaxed.

51 / 99


Probability Theory

Since the pdf of Z = X + Y is the convolution of those of X andY , we can perform the convolution repeatedly to see the pdf of asum of Xi. Below see the pdf from a totally random one (top-leftone).

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 10 20 30 400

1

2

3

4

5

6

7

0 20 40 60 800

100

200

300

400

500

0 50 100 150 2000

1

2

3

4

5x 106

52 / 99


Probability Theory

Chernoff bound: Suppose that u(x) is the step function, whereu(x) = 1, if x ≥ 0, and u(x) = 0, otherwise. Then, it follows that

Pr(X > x) =

∫ ∞

xfX(z)dz =

∫ ∞

−∞u(z−x)fX(z)dz = E[u(X−x)].

Since u(x) ≤ ex , it can be shown that

Pr(X > x) ≤ E[e(X−x)].

More generally,

Pr(X > x) ≤ E[eλ(X−x)], λ ≥ 0.

Cherno! bound

Suppose that u(x) is the step function, where u(x) = 1, if x ! 0, and u(x) = 0,otherwise. Then, it follows that

Pr(X > x) =

! "

x

fX(z)dz =

! "

#"u(z # x)fX(z)dz = E[u(X # x)].

Since u(x) $ ex , it can be shown that

Pr(X > x) $ E[e(X#x)].

More generally,

Pr(X > x) $ E[e!(X#x)],

where ! > 0.

u(x # a)

e(x#a)

xa

19

53 / 99


Probability Theory

The Chernoff bound is

Pr(X > x) ≤ minλ>0

e−λxE[eλX ] = minλ>0

e−λxMX(λ)

I As long as a random variable has the MGF, the Chernoffbound can be found.

I In some cases, the Chernoff bound for a sample mean,Xn = 1

n

∑ni=1Xi, of iid Xi’s, can be expressed as

Pr(Xn > x) ≈ e−n`(x), x > E[Xi],

where `(x) is called the rate function. This means that the tailprobability decreases exponentially with n. The related theoryis called large deviations theory, which is about rare events.

54 / 99


Probability Theory

Chernoff bound for Gaussian random variablesLet X be the Gaussian r.v. with µ mean and σ2 variance. Then,

E[eλX ] =1√2πσ

∫ ∞

−∞eλxe−

12σ2

(x−µ)2dx

It can be shown that

eλxe−1

2σ2(x−µ)2 = exp

(− 1

2σ2

((x− µ)2 − 2σ2λx

))

= exp

(− 1

2σ2

((x− µ− σ2λ)2 − 2σ2λµ− σ4λ2

)).

55 / 99


Probability Theory

Thus,

E[eλX ] =1√2πσ

∫ ∞

−∞eλxe−

12σ2

(x−µ)2dx

=1√2πσ

∫ ∞

−∞e

(− 1

2σ2((x−µ−σ2λ)2−2σ2λµ−σ4λ2)

)dx

=1√2πσ

∫ ∞

−∞e

(− 1

2σ2(x−µ−σ2λ)2

)dx× eλµ+ 1

2λ2σ2

= eλµ+ 12λ2σ2

.

56 / 99


Probability Theory

The Chernoff bound is written as

Pr(X > x) ≤ minλ≥0

E[eλ(X−x)]

= minλ≥0

eλ(µ−x)+ 12λ2σ2

We have the minimum when λopt = x−µσ2 , which gives

Pr(X > x) ≤ e−(x−µ)2

2σ2 .

From this, with X ∼ N (0, 1), we have

Pr(X > x) = Q(x) ≤ e−x2

2 .

For performance analysis of communication systems, we often usethe Q-function. Since the Q-function is an integral function, theChernoff bound, which is a closed-form, can simplify theperformance analysis.

57 / 99


Probability Theory

Law of Large Numbers (LLN)

I Two versions: Strong LLN (SLLN) and Weak LLN (WLLN)I With Xi, consider the sample mean, Xn = 1

n

∑ni=1Xi, where

E[Xi] = µ.I WLLN says

Xn → µ in probability

or for any ε > 0,

limn→∞

Pr(|Xn − µ| > ε) = 0.

I SLLN saysXn → µ with probability 1

or for any ε > 0,

Pr(

limn→∞

Xn = µ)

= 1.

58 / 99


Probability Theory

I SLLN → WLLN, but not WLLN → SLLN

I The sample mean of an iid random sequence of finite variancefollows SLLN

I Relation to the Chernoff boundExample: Consider the sample mean of Gaussian random

variables, Xn = 1n

∑ni=1Xi ∼ N

(µ, σ

2

n

). The Chernoff

bound becomes

Pr(Xn > µ+ ε) ≤ e− ε2

2σ2n or Pr(|Xn − µ| > ε) ≤ 2e

− ε2

2σ2n

As n→∞, e− ε2

2σ2n → 0 for any ε > 0. From this, we can

show that Xn follows WLLN.

59 / 99


Probability Theory

A tour to SLLN (with sample mean of Gaussian iid r.v.’s):

I Define the following event as

En = |Xn − µ| > ε.

I Then, we have

∞∑

n=1

Pr(En) ≤ 2∞∑

n=1

(e−

ε2

2σ2

)n=

2

1− e−ε2

2σ2

<∞, σ2 <∞.

I Thus, according to Borel-Cantelli Lemma,

Pr(En, infinitely often (i.o.)) = 0.

I This means that only finite many of the En’s occur. Thus, asn→∞, Xn must approach µ w.p. 1 (so SLLN).

60 / 99


Probability Theory

Note that the convergence in probability does not mean theconvergence with probability 1 (the other way around is correct).

I With constants A and c > 0, for a given n, define Xi as

Xi =

A, with probability 1− lnnn

A+ c, with probability lnn2n

A− c, with probability lnn2n

I Since limn→∞ lnnn = 0 and

Pr(|Xn −A| > 0) = Pr(Xi 6= A) =lnn

n,

Xn converges to E[Xi] = A in probability as

limn→∞

Pr(|Xn −A| > ε) ≤ limn→∞

lnn

n= 0.

61 / 99


Probability Theory

I However, there are infinite many Xi 6= A (there are about lnnmany Xi among X1, . . . , Xn that are not A). Thus, Xn

does not converge with probability 1.

I This can also be confirmed as (in heuristic)

∞∑

n=1

Pr(En) =

∞∑

n=1

lnn

n≈∫ ∞

1

lnx

xdx =

ln2 x

2

∣∣∣∣∞

1

=∞,

which implies the event En = |Xn − µ| > ε occurs infinitelyoften. Thus, it does not converge with probability 1.

62 / 99


Probability Theory

Types of Convergence

Complete:∑n Pr(|Xn −X| > ε) <∞ → Almost sure:

Pr(Xn → X) = 1

in the pth mean:E[|| · ||p]

⇒

in Probability:Pr(|Xn −X| > ε)→ 0

→ in Distribution:FXn

(x)→ FX(x)

63 / 99


Probability Theory

Order statistics

I For a given set of random variables, X1, . . . , Xn, The orderstatistic of rank k is the kth smallest value in the data set anddenoted by X(k).

I That is,X(1) ≤ . . . ≤ X(n).

I Extreme order statistics:

X(1) = minX1, . . . , Xn, X(n) = maxX1, . . . , Xn.

I The distributions of X(k) are well-known if the Xk’s are iid.

64 / 99


Probability Theory

I Denote by F (x) and f(x) the cdf and pdf of X.

I The cdf of X(k):

F(k)(x) = Pr(X(k) ≤ x)

= Pr(Nx ≥ k)

=

n∑

j=k

(n

j

)F j(x)(1− F (x))n−j ,

where Nx is the number of X that are less than or equal to x:

Nx =

n∑

k=1

1(Xk ≤ x).

65 / 99


Probability Theory

I The CDFs of X(1) and X(n):

F(1)(x) = 1− (1− F (x))n

F(n)(x) = Fn(x)

I The pdf of X(k):

f(k)(x) =n!

(k − 1)!(n− k)!F k−1(x)(1− F (x))n−kf(x)

66 / 99


Probability Theory

Example: Suppose that Xk ∼ Exp(1), f(x) = e−x, x ≥ 0.

I There are n Xk’s. Find the probability that Pr(maxXk ≥ x)when x is sufficiently large.

I Sol: Since

Pr(X(n) ≤ x) = F(n)(x) = (1− e−x)n,

it can be shown that

Pr(maxXk ≥ x) = 1−(

1− 1

ex

)n

≈ 1− exp(− nex

)≈ 1−

(1− n

ex

)= ne−x

= nPr(X ≥ x).

67 / 99


Random Processes

Random Processes

I Continuous-time functions of random properties

I Discrete-time functions of random properties (also called timeseries in usually economics)

I In this lecture, we only focus on few random processes,namely Markov processes and wide sense stationary processes.

68 / 99


Random Processes

Markov processes or chains

I A random sequence (a discrete-time random process) of thefollowing property:

Pr(Xt|Xt−1, . . . , X0) = Pr(Xt|Xt−1).

I different classes of Xt ∈ X1. |X | is finite, a finite number of states2. |X | is countable infinite, e.g., X = Z3. |X | is uncountable infinite, e.g., X = R

I For the cases of 1) and 2), a transition matrix,

Pi,j = Pr(Xt = j |Xt−1 = i),

characterizes the given (time-homogeneous) Markov process.

69 / 99


Random Processes

A Markov chain with two states

70 / 99


Random Processes

I LetPni,j = Pr(Xt+m = j |Xt = i), n ≥ 0.

I The Chapman-Kolmogorov equation is

Pn+mi,j =

∑

k

Pni,kPmk,k, n,m ≥ 0.

I Let P(n) denote the matrix of n-step transition probabilities,P(n) = [Pni,j ]. Then,

P(n+m) = P(n)P(m).

I State j is said to be accessible from state i if Pni,j > 0 forsome n ≥ 0, denoted by i→ j.

71 / 99


Random Processes

I If i→ j and j → i, the two states (i and j) are said tocommunicate, denoted by i↔ j.

I It can be shown that

i↔ j

if i↔ j, j ↔ i

if i↔ j and k ↔ j, i↔ k

I Two states that communicate are said to be in the same class.

I Thus, any two classes in a Markov chain are either disjoint oridentical.

I If a Markov chain has one class, it is said to be irreducible.

72 / 99


Random Processes

I Period:d(i) = gcdn > 0 : Pni,i > 0,

where “gcd” represents the greatest common divisor.

I If d(i) = 1, state i is aperiodic.

I If i↔ j, d(i) = d(j).

I If a Markov chain is irreducible and a state is aperiodic, all thestates are aperiodic.

73 / 99


Random Processes

I Hitting time: the first time returns back to initial state i.

Ti = infn : Xn = i |X0 = i.

I If there is a positive probability that a process never returnsback to i starting from X0 = i or

Pr(Ti <∞) =

∞∑

n=1

Pr(Ti = n) < 1,

state i is said to be transient.

I If state i is not transient, it is recurrent.

74 / 99


Random Processes

I Mean recurrent time:

Mi = E[Ti] =

∞∑

n=1

nPr(Ti = n).

I If Mi is finite, state i is said to be positive recurrent;otherwise, state i is said to be null-recurrent.

75 / 99


Random Processes

I Suppose that X is irreducible and aperiodic.I All states are transient, or all are positive recurrent, or all are

null-recurrent.I An equilibrium (or stationary) probability distribution, π exists

if and only if all states are positive recurrent:

π = πP (πk =∑

i

πiPi,k).

I In a finite Markov chain:I at least one state is recurrent; andI all the recurrent states are positive recurrent.

I Any finite, irreducible, and aperoidic Markov chain has all thestates being positive recurrent and aperiodic (called ergodicstates), which is called a finite ergodic Markov chain.

76 / 99


Random Processes

I A finite ergodic Markov chain has a unique stationarydistribution.

I A Markov chain is called reversible if there exists adistribution π such as

πiPi,j = πjPj,i.

The distribution is also the stationary distribution.

77 / 99


Random Processes

I In a finite ergodic Markov chain,

∑

j 6=iπjPj,i =

∑

j 6=iπiPi,j ,

I This implies that the probability that a chain leaves a stateequals the probability that it enters the state.

I Proof: Since π is the stationary distribution,

πi =∑

j

πjPj,i = πi∑

j

Pi,j

I∑

j 6=i πjPj,i + πiPi,i =∑

j 6=i πiPi,j + πiPi,i

78 / 99


Random Processes

Examples of Markov processes:

I A state of buffer:

Qt = max0, Qt−1 +At −Dt,

where Qt is the state of buffer, At is the arrival process, andDt is the departure process.

I Autoregressive (AR) process: AR(1) process is given by

Xt = aXt−1 +Wt,

where a is the AR coefficient and Wt is the white noiseprocess.

I Birth-death process (a continuous-time Markov process)

79 / 99


Random Processes

Birth-death process

I With a queue with customers, we can model the variation ofthe number of customers as a birth-death process.

80 / 99


Random Processes

Arrival process is assumed to be a Poisson process as

Pr(a(t+ τ)− a(t) = n) =e−λτ

n!(λτ)n,

where a(t) is the accumulated arrivals. For a sufficiently short timeδ, we have

Pr(no arrival) = Pr(a(t+ δ)− a(t) = 0) = 1− λδ + o(δ)

Pr(one arrival) = Pr(a(t+ δ)− a(t) = 1) = λδ + o(δ)

Pr(more than one arrival) = Pr(a(t+ δ)− a(t) ≥ 2) = o(δ)

where limδ→0 o(δ)/δ = 0.

81 / 99


Random Processes

I The service time has the exponential distribution:

f(s) = µe−µs,

where µ is the service rate or Ts = 1µ is the average service

time. For a short time interval δ, we have

Pr(s > δ) = e−µδ = 1− µδ + o(δ) = 1− δ

Ts+ o(δ).

I Define the probability Pi,j as

Pi,j = PrNk+1 = j | Nk = i

where Nk represents the number of the customers in serviceat the kth instant.

82 / 99


Random Processes

Noting Pr(no departure) = Pr(s > δ),

P0,0 = Pr(no arrival)

= 1− λδ + o(δ)

Pi,i = Pr(no arrival) Pr(no departure) + Pr(arrival) Pr(departure)

= (1− λδ)(1− µδ) + λδµδ + o(δ)

= 1− (λ+ µ)δ + o(δ)

Pi,i+1 = Pr(arrival) Pr(no departure)

= λδ(1− µδ) + o(δ)

= λδ + o(δ)

Pi,i−1 = Pr(no arrival) Pr(departure)

= (1− λδ)(µδ) + o(δ)

= µδ + o(δ)

83 / 99


Random Processes

A birth-death process:

Let pn denote the probability of n customers. Then, in equilibrium,we have

pn−1Pn−1,n = pnPn,n−1 ⇒ pn−1λ = pnµ

84 / 99


Random Processes

Continuous-time random processes

I Example: A sine wave s(t) = sin(2πt+ θ) is a randomprocess when the phase θ is a random variable.

I A complete satistical description of a random process X(t) isknown if for any integer n and any choice of the sampleinstances (t1, t2, · · · , tn) is given by the joint pdf

fX(t1),X(t2),··· ,X(tn)(x1, x2, · · · , xn).

I A random process X(t) is a Gaussian process if for any n andall (t1, t2, · · · , tn), the random variablesX(t1), X(t2), · · · , X(tn) have a jointly Gaussian densityfunction.

85 / 99


Random Processes

I A random process X(t) is a wide sense stationary (WSS)process if

E[X(t)] = µ (independent of time t)

andE[X(t)X(t+ τ)] = RX(τ),

and RX(τ) is called the autocorrelation function of X(t).I The spectral density is defined as

SX(f) =

∫ ∞

−∞RX(τ)e−j2πfτdτ.

I A process X(t) is called a white process if it has a flatspectral density, i.e., SX(f) = C for all f .⇒ Then, its autocorrelation is

RX(τ) = Cδ(τ).

86 / 99


Random Processes

In most cases, we only deal with some simple random processes (orrandom signals) in communications. Some examples are as follows:

I White Gaussian noise, n(t): it is WSS and has the spectrumas

SN (f) =N0

2∀f,

where N0/2 is called “double-sided” spectral density.

I Random phase sinusoid: the random signal is written as

s(t) = sin(2πft+ θ),

where θ is a uniform r.v. over [0, 2π).

87 / 99


Random Processes

White noise

0 100 200 300 400 500 600 700 800 900 1000−4

−3

−2

−1

0

1

2

3

4

88 / 99


Random Processes

Sine waves with random phases

0 1 2 3 4 5 6 7 8 9 10−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

89 / 99


Random Processes

Q) Is s(t) = sin(2πft+ θ) WSS?A) We need to verify

1. the mean (it should be a constant over the time),

2. the autocorrelation (it should only be a function of the timedifference).

The mean is

E[sin(2πft+ θ)] =

∫ 2π

0sin(2πft+ θ)

1

2πdθ = 0.

90 / 99


Random Processes

The autocorrelation is

E[sin(2πft+ θ) sin(2πf(t+ τ) + θ)]

=

∫ 2π

0sin(2πft+ θ) sin(2πf(t+ τ) + θ)

1

2πdθ

=1

2π

∫ 2π

0

1

2(cos(2πfτ)− cos(4πft+ 2πfτ + 2θ)) dθ

=1

2cos(2πfτ).

⇒ E[s(t)s(t+ τ)] = RS(τ) =1

2cos(2πfτ).

This shows that sin(2πft+ θ) is WSS.

91 / 99


Random Processes

Filtering of stochastic signalsLet h(t) be the impulse response of a linear system. The inputsignal to the linear system is a stochastic process, X(t). Then, theoutput is written as

Y (t) = h(t) ∗X(t)

=

∫

τh(τ)X(t− τ)dτ

I Mean:E[Y (t)] = h(t) ∗ E[X(t)]

I Autocorrelation functions:

RY (τ) = RX(τ) ∗ h(τ) ∗ h(−τ)

I Power spectra:

SY (f) = |H(f)|2SX(f).

92 / 99


Random Processes

Proofs

I E[Y (t)] = h(t) ∗ E[X(t)]:

E[Y (t)] = E[

∫h(t− τ)X(τ)dτ ]

=

∫h(t− τ)E[X(τ)]dτ

=

∫h(t− τ)µ(τ)dτ

= h(t) ∗ E[X(t)].

93 / 99


Random Processes

i) RY (τ) = RX(τ) ∗ h(τ) ∗ h(−τ):

RY (τ) = E[Y (t)Y (t− τ)]

= E[

(∫h(u)X(t− u)du

)(∫h(v)X(t− τ − v)dv

)]

=

∫ ∫h(u)h(v)E[X(t− u)X(t− τ − v)]dudv

=

∫ ∫h(u)h(v)RX(τ + v − u)dudv

=

∫[h(τ + v) ∗RX(τ + v)]h(v)dv

Let RY X(τ) = h(τ) ∗RX(τ). Then,

RY (τ) =

∫RY X(τ + v)h(v)dv = h(τ) ∗RY X(−τ)

= h(τ) ∗ h(−τ) ∗RX(−τ) = h(τ) ∗ h(−τ) ∗RX(τ).

ii) SY (f) = |H(f)|2SX(f):Directly obtained from the above. 94 / 99


Random Processes

A rectangular pulse filter and the output spectrum

Filtered output

Rectangular pulse

White signal

0 100 200 300 400 500 600 700 800 900 1000−4

−3

−2

−1

0

1

2

3

4

0 1 2 3 4 5 6 7 8 9 10x 104

0

1000

2000

3000

4000

5000

6000

7000

white signals (time domain) the output spectrum (freq. domain)

95 / 99


Random Processes

Bandlimited process and samplingLet X(t) be a stationary bandlimited process, i.e., SX(f) = 0 for|f | ≥W . Then, the following relation holds (the mean-squarestochastic convergence).

E|X(t)−∞∑

n=−∞X(kTs)sinc(2W (t− kTs))|2 = 0,

where Ts = 1/2W .Note: E[|X − S|2] = 0 means the X and S are the same in themean-square error sense.

96 / 99


Random Processes

Ergodic processes

I A strictly stationary (SS) process is a process in which for alln, all (t1, t2, · · · , tn), and all ∆

fX(t1),X(t2),··· ,X(tn)(x1, x2, · · · , xn)= fX(t1+∆),X(t2+∆),··· ,X(tn+∆)(x1, x2, · · · , xn)

A process is called M th-order stationary if the abovecondition holds for all n ≤M .

I A SS process is a WSS process.

97 / 99


Random Processes

Two types of averages for a SS process X(t):

I Ensemble average

E[g(X(t0))] =

∫g(x)fX(t0)(x)dx.

Note: E[g(X(t0))] should be independent of t0.

I Time average

< g(X(t0)) >i= limT→∞

1

T

∫ T/2

−T/2g(x(t;wi))dt.

98 / 99


Random Processes

Ergodic: If it happens that for all functions g(x), < g(X(t0)) >iis independent of i and equal to E[g(X(t0))], then the process iscalled ergodic.

I Power spectrum of the ergodic process

SX(f) = E[

limT→∞

|XT (f)|2

T

]= lim

T→∞E[|XT (f)|2]

T,

where XT (f) =∫ T/2−T/2X(t)e−j2πftdt.

99 / 99

Date post:	30-Aug-2018
Category:	Documents
Upload:	lykhue
View:	247 times
Download:	0 times

Probability and Random Processes -...

Documents