Probability and Random Processes
Probability and Random Processes
Jinho ChoiGIST
February 2017
1 / 99
Probability and Random Processes
What Albert Einstein said:
I “As I have said so many times, God doesn’t play dice with theworld.”
I “Two things are infinite: the universe and human stupidity.”
So, he does believe in infinity,but not in randomness.
2 / 99
Probability and Random Processes
Main aims
I Understand fundamental issues of probability theory (pdf,mean, and variance)
I Understand joint and conditional pdf, independence, andcorrelation
I Learn properties of Gaussian random variables and be able toderive Chernoff bound
I Understand random processes with key definitions
I Be able to compute the mean and variance of samples fromrandom processes
3 / 99
Probability and Random Processes
Probability Theory
Probability Theory
The three axioms with a sample space, Ω, a family of sets, F , forallowable events, and measure Pr(·):
I The first axiom: The probability of an event, E, is anon-negative real number:
Pr(E) ≥ 0.
I The second axiom:Pr(Ω) = 1.
I The third axiom: For any countable sequence of mutuallyexclusive events E1, E2, . . .,
Pr(E1 ∪ E2 ∪ · · · ) =
∞∑
i=1
Pr(Ei).
4 / 99
Probability and Random Processes
Probability Theory
Random variables: A random variable is a mapping from thesample space Ω to the set of real numbers.
Sample space (abstract space) ⇒ Real number
The main idea of random variables is to describe some randomevents by numbers.
real numbers
Event space Ω
event ω X( )ω
X: random variable:
ω ∈ Ω→ X(ω) ∈ (−∞,∞)
5 / 99
Probability and Random Processes
Probability Theory
Example of random variables: A game of dice
I Before you draw a dice, the number of dots is unknown.⇒ This number can be considered as a random variable.I Once a dice is drawn, we have a particular number, which
would be one of 1, . . . , 6. This is called a “realization.”
realizations of 2, 3, 46 / 99
Probability and Random Processes
Probability Theory
Cumulative distribution function (CDF):
FX(x) = Pr(X ≤ x) ,
where X is the random variable (r.v.) and x is a real number. Bydefinition, the CDF is a nondecreasing function.Example: a dice
Cummulative distribution function (CDF):
FX(x) = Pr(X ! x),
where X is the random variable (r.v.) and x is a real number.
Probability density function (PDF):
fX(x) =d
dxFX(x)
Example: a dice
1 4 5 62 3
2/6
3/6
4/6
5/6
1
1/6
FX(x) = Pr(X ! x)
x
3
7 / 99
Probability and Random Processes
Probability Theory
There are different types of r.v.’s such as:
I continuous r.v.: X has a continuous value
I discrete r.v.: X has a discrete value
Examples:
I the phase of a sinusoid, θ: sin(ωct+ θ)⇒ continuous r.v.
I the number of dots of a dice⇒ discrete r.v.
8 / 99
Probability and Random Processes
Probability Theory
A discrete r.v. with binomial distribution
I Consider a random experiement that has two possibleoutcomes. For example, the outcome of this experiment canbe expressed (1 to denote success; 0 to denote failure) as
Y =
1, with probability p;0, with probability 1− p.
I Consider a sum of n outcomes from independent experiments:
X =
n∑
j=1
Yj ∈ Ω = 0, 1 . . . , n.
I Then, X is the binomial random variable with parameters nand p, X ∼ B(n, p):
Pr(X = j) =
(n
j
)pj(1− p)n−j , j = 0, . . . , n.
9 / 99
Probability and Random Processes
Probability Theory
Continuous r.v., X, has the probability density function (pdf) as
fX(x) =d
dxFX(x) .
I As FX(x) is nondecreasing, fX(x) ≥ 0.
I In general,
∫ t
−∞fX(x)dx = FX(t) .
I Since limx→∞ FX(x) = 1,∫∞−∞ fX(x)dx = 1.
For a discrete r.v., the pdf becomes the probability massfunction (pmf) which is actually probability.
I Example of dice:
fX(X = k)→ Pr(X = k) =1
6, k = 1, 2, . . . , 6.
10 / 99
Probability and Random Processes
Probability Theory
Mean and VarianceFor a r.v. X, the mean of X (or g(X), where g(x) is a function ofx) is given by
I a continuous r.v.:
E[X] =
∫xfX(x)dx or E[g(X)] =
∫g(x)fX(x)dx
I a discrete r.v.:
E[X] =∑
k
xk Pr(xk) or E[g(X)] =∑
k
g(xk) Pr(xk)
The variance is given by
V ar(X) = E[(X − E[X])2]
The mean and variance are used to characterize a random variable.11 / 99
Probability and Random Processes
Probability Theory
Mean of X ∼ B(n, p)
E[X] =
n∑
j=0
j
(n
j
)pj(1− p)n−j
=
n∑
j=1
jn!
j!(n− j)!pj(1− p)n−j
=
n∑
j=1
n(n− 1)!
(j − 1)!(n− j)!ppj−1(1− p)n−j
= np
n∑
j=1
(n− 1)!
(j − 1)!(n− j)!pj−1(1− p)n−j
= np(p+ (1− p))n−1 = np.
12 / 99
Probability and Random Processes
Probability Theory
Geometric random variable:
I pmf: Pr(X = k) = (1− p)k−1p, k ≥ 1Note that
∞∑
k=1
(1− p)k−1p = p
∞∑
k=0
(1− p)k = p1
1− (1− p)= 1
I Example: The number of independent flips of a coin untilhead first apprears.
I Mean: Letting q = 1− p, it can be shown that
E[X] =
∞∑
k=1
k(1− p)k−1p = pd
dq
∞∑
k=0
qk
= pd
dq
1
1− q= p
1
(1− q)2=
1
p
13 / 99
Probability and Random Processes
Probability Theory
A continuous r.v. with uniform distributionLet us consider a uniform r.v. X that has the pdf as
fX(x) =
1A , 0 ≤ x ≤ A;0, otherwise
The mean is
E[X] =
∫ A
0x
1
Adx =
1
A
x2
2
∣∣∣∣A
x=0
=A
2.
The variance is
E[(X − E[X])2] =
∫ A
0
(x− A
2
)2 1
Adx
=
∫ A/2
−A/2z2 1
Adz
=1
A× 1
3z3
∣∣∣∣A/2
−A/2=A2
1214 / 99
Probability and Random Processes
Probability Theory
More examples on expectationQ) Let X be a r.v. and a and c are constants. Show that
E[aX] = aE[X] and E[X + c] = E[X] + c
A)
E[aX] =
∫(a x)fX(x)dx = a
∫xfX(x)dx = aE[X]
E[X + c] =
∫(x+ c)fX(x)dx =
∫xfX(x)dx+
∫cfX(x)dx = E[X] + c
Q) Show that E[X2] = V ar(X) + (E[X])2
A)
E[(X − E[X])2] = E[X2 − 2(E[X])X + (E[X])2]= E[X2]− 2(E[X])E[X] + (E[X])2
= E[X2]− (E[X])2
15 / 99
Probability and Random Processes
Probability Theory
Q) Show that V ar(aX + c) = a2V ar(X)
Q) Suppose that X is a r.v. with mean 1 and variance 3. FindE[3X2 + 2X].A)
E[3X2 + 2X] = 3E[X2] + 2E[X]= 3(V ar(X) + E2[X]) + 2E[X]
= 3× (3 + 12) + 2× 1 = 14.
16 / 99
Probability and Random Processes
Probability Theory
Jensen’s inequality:For a convex function, g(x),
E[g(X)] ≥ g(E[X]) .
A convex function satisfiesλg(x1) + (1− λ)g(x2) ≥ g(λx1 + (1− λ)x2), λ ∈ [0, 1]
17 / 99
Probability and Random Processes
Probability Theory
Gaussian or normal random variable:
fX(x) = N (µ, σ2) =1√2πσ
exp
(−(x− µ)2
2σ2
),
where the mean
E[X] =
∫ ∞
−∞xfX(x)dx = µ
and the variance
E[(X − E[X])2] =
∫ ∞
−∞(x− µ)2fX(x)dx = σ2
18 / 99
Probability and Random Processes
Probability Theory
Normal or Gaussian pdfs:
19 / 99
Probability and Random Processes
Probability Theory
Normal or Gaussian cdfs:
20 / 99
Probability and Random Processes
Probability Theory
Q-function: a tail of the normal pdf.
Pr(X ≥ x) = Q(x)4=
∫ ∞
x
1√2π
exp
(− t
2
2
)dt
21 / 99
Probability and Random Processes
Probability Theory
Conditional probabilityThe conditional probability of an event A given B is denoted andgiven by
Pr(A |B) =Pr(A,B)
Pr(B)
Q) Find the probability that the face of one dot occurs assumingodd dots are observed:A) We have
Pr(odd) =1
2and
Pr(1, odd) = Pr(1) =1
6.
Hence, it follows
Pr(1| odd) =Pr(1, odd)
Pr(odd)=
1
3.
22 / 99
Probability and Random Processes
Probability Theory
Multiple random variables: The joint pdf is written as
fXY (x, y) =∂2
∂x∂yFXY (x, y) =
∂2
∂x∂yPr(X ≤ x, Y ≤ y)
Conditional pdf:
fX|Y (x|y) =
fXY (x,y)fY (y) , if fY (y) 6= 0.
0, otherwise.
Marginalization:
fX(x) =
∫fXY (x, y)dy or fY (y) =
∫fXY (x, y)dx
23 / 99
Probability and Random Processes
Probability Theory
Expectation with two r.v.’s:For continuous r.v.’s, double-integral should take place:
E[g(X,Y )] =
∫ ∫g(x, y)fXY (x, y)dxdy (continuous)
E[g(X,Y )] =∑
x
∑
y
g(x, y) Pr(X = x, Y = y) (discrete).
The conditional expectation is defined by
E[X|Y ] =
∫xfX|Y (x|Y )dx (continuous)
E[X|Y ] =∑
x
xPr(X = x |Y ) (discrete).
Note that E[X|Y ] = g(Y ) is a function of Y , which is a randomvariable.
24 / 99
Probability and Random Processes
Probability Theory
Q) Show thatE[XY ] = E[Y × E[X|Y ]]
A)
E[XY ] =
∫ ∫xyfXY (x, y)dxdy
=
∫ ∫xyfX|Y (x|y)fY (y)dxdy
=
∫y
(∫xfX|Y (x|y)dx
)fY (y)dy
=
∫yE[X|y]fY (y)dy = E[Y E[X|Y ]].
25 / 99
Probability and Random Processes
Probability Theory
Exponential distribution and memorylessness
I The exponential distribution is given by
f(x;λ) =
λe−λx, if x ≥ 0;0, o.w.
I The mean and variance are 1/λ and 1/λ2, respectively.
I An exponentially distributed random variable T obeys thefollowing relation:
Pr(T > s+ t |T > s) = Pr(T > t), s, t ≥ 0.
This property is called the memorylessness.
26 / 99
Probability and Random Processes
Probability Theory
Independence and correlationThe joint cdf (more than 2 r.v.’s) is written as
FX1,X2,··· ,Xn(x1, x2, · · · , xn) = Pr(X1 ≤ x1, · · · , Xn ≤ xn)
The joint pdf is given by
fX1,X2,··· ,Xn(x1, x2, · · · , xn) =∂n
∂x1 · · · ∂xnFX1,X2,··· ,Xn(x1, x2, · · · , xn).
The marginal pdf is given by
fX1(x1) =
∫
x2
· · ·∫
xn
fX1,X2,··· ,Xn(x1, x2, · · · , xn)dx2 · · · dxn.
If X1, · · · , Xn are independent, then
FX1,X2,··· ,Xn(x1, x2, · · · , xn) = FX1(x1) · · ·FXn(xn)
and
fX1,X2,··· ,Xn(x1, x2, · · · , xn) = fX1(x1) · · · fXn(xn) .27 / 99
Probability and Random Processes
Probability Theory
I Note: If X and Y are independent,
fX|Y (x|y) =fXY (x, y)
fY (y)= fX(x).
I The correlation of X and Y is defined as
corr(X,Y ) = E[XY ]
and the covariance is defined as
cov(X,Y ) = E[(X − E[X])(Y − E[Y ])].
If cov(X,Y ) = 0, X and Y are said to be uncorrelated (notuncovarianced).
28 / 99
Probability and Random Processes
Probability Theory
Joint Gaussian random variables: Let us define the randomvector as
x = [X1 X2 · · · Xn]T.
The random variables, Xi, are jointly Gaussian if the pdf can bewritten as
f(x1, x2, · · · , xn) =1√
2π det(C)exp
(−1
2(x−m)TC−1(x−m)
),
where the mean vector is m = E[x] = [E[X1] . . . E[Xn]]T and thecovariance matrix is
C = E[(X−m)(X−m)T]
=
E[X1X1]− E[X1]E[X1] . . . E[X1Xn]− E[X1]E[Xn]...
. . ....
E[XnX1]− E[Xn]E[X1] . . . E[XnXn]− E[Xn]E[Xn]
.
29 / 99
Probability and Random Processes
Probability Theory
30 / 99
Probability and Random Processes
Probability Theory
Moment Generating Function (MGF)
I Finding mn = E[Xn] (the nth moment)
I Define the MGF as
MX(t) = E[etX ].
I Using the Taylor series,
MX(t) = E
[ ∞∑
k=0
1
k!(tX)k
]=
∞∑
k=0
tkmk
k!
I From this, the kth moment can be found as
mk =dk
dtkMX(t)
∣∣∣∣t=0
.
31 / 99
Probability and Random Processes
Probability Theory
I The MGF can be seen as the Laplace transform of the pdf,fX(t):
MX(t) =
∫etXfX(x)dx
Thus, using the inverse Laplace transform, the pdf can befound from the MGF.
I Let X and Y be independent random variables. The MGF ofZ = X + Y is given by
MZ(t) = E[etZ ] = E[et(X+Y )] = MX(t)MY (t).
Thus, the pdf of Z is the inverse Laplace transform ofMX(t)MY (t).
I Thus, the pdf of Z is the convolution of the pdfs of X and Y :fZ(x) = fX(x) ∗ fY (x).
32 / 99
Probability and Random Processes
Probability Theory
Sum of independent Gaussian random variables: Let Xi beindependent Gaussian r.v.’s, Xi ∼ N (µi, σ
2i ). Then,
Y =∑
i
Xi
is also a Gaussian r.v.
I The MGF of Xi is Mi(t) = exp(µit+ 12σ
2i t
2).
I The MGF of Y is
MY (t) =∏
i
Mi(t) = exp
(∑
i
µit+1
2σ2i t
2
)
⇒Y ∼ N
(∑
i
µi,∑
i
σ2i
).
33 / 99
Probability and Random Processes
Probability Theory
Gaussian related distributions
I χ2 distribution: The pdf of the χ2 distribution with N degreesof freedom is written as
fX(x) =1
Γ(N/2)2N/2x(N/2)−1e−x/2 x ≥ 0,
where Γ(x) =∫∞
0 tx−1e−tdt is the Gamma function. If x = nis an integer,
Γ(n) = (n− 1)!.
Let X1, X2, · · · , XN are identically independentlly distributed(i.i.d.) random variables (r.v.’s). In addition, letXi ∼ N (0, 1). Then, Y =
∑Ni=1X
2i is the χ2 r.v. with N
degrees of freedom.
34 / 99
Probability and Random Processes
Probability Theory
χ2 pdf
35 / 99
Probability and Random Processes
Probability Theory
I The χ2 distribution with 2 degrees of freedom is anexponential distribution:
fX(x) =1
2e−
x2 , x ≥ 0.
I The exponential distribution with parameter λ (denoted byX ∼ Exp(λ)) is given by
f(x;λ) =
λe−λx, x ≥ 0,
0, x < 0.
I The mean and variance of X ∼ Exp(λ) are
E[X] =1
λ
V ar(X) =1
λ2.
36 / 99
Probability and Random Processes
Probability Theory
I Let X1 and X2 be i.i.d. Gaussian r.v. and Xi ∼ N (0, 1).Define Y = X1 + jX2. Then, Y is the circularly symmetriccomplex Gaussian (CSCG) r.v. The mean and variance are
E[Y ] = 0,E[Y 2] = 0, and E[|Y |2] = 2.
I From a zero-mean CSCG random vector to a real-valuedGaussian random vector:
y = x1+jx2 ⇒[x1
x2
]∼ N
([00
],
[E[x1x
T1 ] E[x1x
T2 ]
E[x2xT1 ] E[x2x
T2 ]
])
I Then, what is the distribution of z = Ay if y is a zero-meanCSCG random vector to a real-valued Gaussian?
37 / 99
Probability and Random Processes
Probability Theory
I The real-valued A:
A→ A =
[<(A) −=(A)=(A) <(A)
]
I Noting that
z→[<(z)=(z)
]= A
[<(y)=(y)
]= A
[x1
x2
],
it can be shown that[<(z)=(z)
]∼ N
([00
], A
[E[x1x
T1 ] E[x1x
T2 ]
E[x2xT1 ] E[x2x
T2 ]
]AT
).
38 / 99
Probability and Random Processes
Probability Theory
Remarks:
I Let X be a CSCG r.v. Then, |X|2 becomes a χ2 r.v.
I The Gamma function Γ(n):
Γ(n) =
∫ ∞
0tn−1e−tdt, n > 0.
Some properties:Γ(n+ 1) = nΓ(n)
andΓ(n+ 1) = n!
In addition,
Γ
(1
2
)=√π.
39 / 99
Probability and Random Processes
Probability Theory
Rician and Rayleigh random variables
I The Rayleigh r.v. has a single parameter pdf given by
fX(x) =x
σ2e−x
2/2σ2, x ≥ 0
and the cdf is
FX(x) = 1− e−x2/2σ2, x ≥ 0.
If X1 and X2 are independent Gaussian random vairables withmean 0 and variance σ2, then X =
√X2
1 +X22 is the
Rayleigh r.v.
40 / 99
Probability and Random Processes
Probability Theory
I The Rician pdf is written as
fX(x) =x
σ2I0
(µxσ2
)e(−x2+µ2)/2σ2
, x ≥ 0,
where I0(x) is the modified Bessel function of zero order. Itcan be shown that if X1 and X2 are independent Gaussian,and X1 ∼ N (µ, σ2) and X2 ∼ N (0, σ2), thenX =
√X2
1 +X22 is the Rician r.v.
41 / 99
Probability and Random Processes
Probability Theory
Approximations from binomial to Poisson, to Gaussian
I A (discrete) Poisson random variable with parameter λ,denoted by Pois(λ), has the following distribution:
Pr(X = k) =e−λλk
k!, k = 0, 1, . . .
The followings are often useful:
I A binomial random variable ∼ B(n, p) can be approximatedby a Poisson random variable with λ = np if n is sufficientlylarge.
I A Poisson random variable with λ can be approximated by aGaussian random variable if λ is large.
42 / 99
Probability and Random Processes
Probability Theory
Limit of X ∼ B(n, p): (taking a heuristic approach)
Pr(X = k) =
(n
k
)pk(1− p)n−k
≈ nk
k!pk
(1− p)n
(1− p)k(for large n)
≈ (np)k
k!
e−pn
e−pk(use (1− x) ≈ e−x, |x| 1)
Let λ = pn and taking n→∞, which means that p approaches 0(e−pk → 1). Then,
Pr(X = k)→ λke−λ
k!,
which is the Poisson distribution with parameter λ.
43 / 99
Probability and Random Processes
Probability Theory
Sum of independent Poisson rvs is also Poisson
I The MGF of Pois(λ):
MX(t) = E[etX] =
∞∑
k=0
etkλke−λ
k!
= e−λ∞∑
k=0
(etλ)k
k!= e−λee
tλ = exp(λ(et − 1)).
I Z = X + Y , where X ∼ Pois(λX) and Y ∼ Pois(λY ), has theMGF:
MZ(t) = MX(t)MY (t) = eλX(et−1)eλY (et−1) = e(λX+λY )(et−1),
which leads to Z ∼ Pois(λX + λY ).
44 / 99
Probability and Random Processes
Probability Theory
Standardized Poisson random variable approaches Gaussian
I Let X ∼ Pois(λ).I E[X] = λ and V ar(X) = λ (check these!)I The standardized Poisson rv becomes
X − E[X]√V ar(X)
=X − λ√
λ(→ Z ∼ N (0, 1) as λ→∞) .
I The MGF is
E[etX−λ√
λ ] = E[et√λX
]e−t√λ = exp(λ(e
t√λ − 1))e−t
√λ
= exp
(λ
(t√λ
+t2
2!λ+
t3
3!λ32
+ . . .
)− t√λ
)
= exp
(t
2λ
)as λ→∞,
which is the MGF of Z ∼ N (0, 1).45 / 99
Probability and Random Processes
Probability Theory
Transformation Method
I Suppose that U is uniformly distributed in the interval [0, 1].
I From U , we may want to generate r.v., X, that have cdfFX(x).
I To this end, defineZ = F−1
X (U).
I Then, since Pr(u < t) = t, t ∈ [0, 1],
Pr(Z ≤ x) = Pr(F−1X (u) ≤ x) = Pr(u ≤ FX(x)) = FX(x).
I That is, Z becomes a r.v. X of cdf FX(x).
46 / 99
Probability and Random Processes
Probability Theory
Example: Consider a circle of radius R. We want to generate r.v.that are uniformly distributed over the area of this circle.
I Using the polar coordination, let the r.v. uniformly distributedover the circle, denoted by Y , be
Y = Xejθ,
where θ ∼ U(0, 2π) and X ∼ fX(x) = 2xR2 , 0 ≤ x ≤ R.
I Since FX(x) = x2
R2 ,
X = F−1X (U) = R
√U.
I MATLAB script:>> theta = (2*pi).*rand(N,1);
>> X = R.*sqrt(rand(N,1));
47 / 99
Probability and Random Processes
Probability Theory
R = 2, 1000 samples from uniform distribution
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
48 / 99
Probability and Random Processes
Probability Theory
Monte Carlo Method
I The Monte Carlo method is a set of approaches to performparameter estimation by simulating probability models.
I It is useful if analytical means (i.e., pencil and paper methods)is not available.
I It can be used for fairly complex cases such as communicationsystems and networks.
I In the Monte Carlo method, the generation of r.v.’s are veryimportant.
49 / 99
Probability and Random Processes
Probability Theory
R = 1, N = 1000 samples from uniform distribution over a square([0, 1]× [0, 1])
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
π ≈ M/NR2/4
, where M is the number of the samples whose length is
less than or equal to R among N .50 / 99
Probability and Random Processes
Probability Theory
Central Limit Theorem (CLT)
I Let Xi be a sequence of independent and identicallydistributed (iid) random variables with E[Xi] = µ andV ar(Xi) = σ2 <∞.
I Then,√n
((1
n
n∑
i=1
Xi
)− µ
)d−→ N (0, σ2),
the distribution of√n((
1n
∑ni=1Xi
)− µ
)converges to the
zero-mean Gaussian distribution with the same variance, σ2.
I Roughly, it says that a sum of independent random variablesconverges to a Gaussian random variable.
I There are variations of CLT where independent and identicalconditions are relaxed.
51 / 99
Probability and Random Processes
Probability Theory
Since the pdf of Z = X + Y is the convolution of those of X andY , we can perform the convolution repeatedly to see the pdf of asum of Xi. Below see the pdf from a totally random one (top-leftone).
0 5 10 15 200
0.2
0.4
0.6
0.8
1
0 10 20 30 400
1
2
3
4
5
6
7
0 20 40 60 800
100
200
300
400
500
0 50 100 150 2000
1
2
3
4
5x 106
52 / 99
Probability and Random Processes
Probability Theory
Chernoff bound: Suppose that u(x) is the step function, whereu(x) = 1, if x ≥ 0, and u(x) = 0, otherwise. Then, it follows that
Pr(X > x) =
∫ ∞
xfX(z)dz =
∫ ∞
−∞u(z−x)fX(z)dz = E[u(X−x)].
Since u(x) ≤ ex , it can be shown that
Pr(X > x) ≤ E[e(X−x)].
More generally,
Pr(X > x) ≤ E[eλ(X−x)], λ ≥ 0.
Cherno! bound
Suppose that u(x) is the step function, where u(x) = 1, if x ! 0, and u(x) = 0,otherwise. Then, it follows that
Pr(X > x) =
! "
x
fX(z)dz =
! "
#"u(z # x)fX(z)dz = E[u(X # x)].
Since u(x) $ ex , it can be shown that
Pr(X > x) $ E[e(X#x)].
More generally,
Pr(X > x) $ E[e!(X#x)],
where ! > 0.
u(x # a)
e(x#a)
xa
19
53 / 99
Probability and Random Processes
Probability Theory
The Chernoff bound is
Pr(X > x) ≤ minλ>0
e−λxE[eλX ] = minλ>0
e−λxMX(λ)
I As long as a random variable has the MGF, the Chernoffbound can be found.
I In some cases, the Chernoff bound for a sample mean,Xn = 1
n
∑ni=1Xi, of iid Xi’s, can be expressed as
Pr(Xn > x) ≈ e−n`(x), x > E[Xi],
where `(x) is called the rate function. This means that the tailprobability decreases exponentially with n. The related theoryis called large deviations theory, which is about rare events.
54 / 99
Probability and Random Processes
Probability Theory
Chernoff bound for Gaussian random variablesLet X be the Gaussian r.v. with µ mean and σ2 variance. Then,
E[eλX ] =1√2πσ
∫ ∞
−∞eλxe−
12σ2
(x−µ)2dx
It can be shown that
eλxe−1
2σ2(x−µ)2 = exp
(− 1
2σ2
((x− µ)2 − 2σ2λx
))
= exp
(− 1
2σ2
((x− µ− σ2λ)2 − 2σ2λµ− σ4λ2
)).
55 / 99
Probability and Random Processes
Probability Theory
Thus,
E[eλX ] =1√2πσ
∫ ∞
−∞eλxe−
12σ2
(x−µ)2dx
=1√2πσ
∫ ∞
−∞e
(− 1
2σ2((x−µ−σ2λ)2−2σ2λµ−σ4λ2)
)dx
=1√2πσ
∫ ∞
−∞e
(− 1
2σ2(x−µ−σ2λ)2
)dx× eλµ+ 1
2λ2σ2
= eλµ+ 12λ2σ2
.
56 / 99
Probability and Random Processes
Probability Theory
The Chernoff bound is written as
Pr(X > x) ≤ minλ≥0
E[eλ(X−x)]
= minλ≥0
eλ(µ−x)+ 12λ2σ2
We have the minimum when λopt = x−µσ2 , which gives
Pr(X > x) ≤ e−(x−µ)2
2σ2 .
From this, with X ∼ N (0, 1), we have
Pr(X > x) = Q(x) ≤ e−x2
2 .
For performance analysis of communication systems, we often usethe Q-function. Since the Q-function is an integral function, theChernoff bound, which is a closed-form, can simplify theperformance analysis.
57 / 99
Probability and Random Processes
Probability Theory
Law of Large Numbers (LLN)
I Two versions: Strong LLN (SLLN) and Weak LLN (WLLN)I With Xi, consider the sample mean, Xn = 1
n
∑ni=1Xi, where
E[Xi] = µ.I WLLN says
Xn → µ in probability
or for any ε > 0,
limn→∞
Pr(|Xn − µ| > ε) = 0.
I SLLN saysXn → µ with probability 1
or for any ε > 0,
Pr(
limn→∞
Xn = µ)
= 1.
58 / 99
Probability and Random Processes
Probability Theory
I SLLN → WLLN, but not WLLN → SLLN
I The sample mean of an iid random sequence of finite variancefollows SLLN
I Relation to the Chernoff boundExample: Consider the sample mean of Gaussian random
variables, Xn = 1n
∑ni=1Xi ∼ N
(µ, σ
2
n
). The Chernoff
bound becomes
Pr(Xn > µ+ ε) ≤ e− ε2
2σ2n or Pr(|Xn − µ| > ε) ≤ 2e
− ε2
2σ2n
As n→∞, e− ε2
2σ2n → 0 for any ε > 0. From this, we can
show that Xn follows WLLN.
59 / 99
Probability and Random Processes
Probability Theory
A tour to SLLN (with sample mean of Gaussian iid r.v.’s):
I Define the following event as
En = |Xn − µ| > ε.
I Then, we have
∞∑
n=1
Pr(En) ≤ 2∞∑
n=1
(e−
ε2
2σ2
)n=
2
1− e−ε2
2σ2
<∞, σ2 <∞.
I Thus, according to Borel-Cantelli Lemma,
Pr(En, infinitely often (i.o.)) = 0.
I This means that only finite many of the En’s occur. Thus, asn→∞, Xn must approach µ w.p. 1 (so SLLN).
60 / 99
Probability and Random Processes
Probability Theory
Note that the convergence in probability does not mean theconvergence with probability 1 (the other way around is correct).
I With constants A and c > 0, for a given n, define Xi as
Xi =
A, with probability 1− lnnn
A+ c, with probability lnn2n
A− c, with probability lnn2n
I Since limn→∞ lnnn = 0 and
Pr(|Xn −A| > 0) = Pr(Xi 6= A) =lnn
n,
Xn converges to E[Xi] = A in probability as
limn→∞
Pr(|Xn −A| > ε) ≤ limn→∞
lnn
n= 0.
61 / 99
Probability and Random Processes
Probability Theory
I However, there are infinite many Xi 6= A (there are about lnnmany Xi among X1, . . . , Xn that are not A). Thus, Xn
does not converge with probability 1.
I This can also be confirmed as (in heuristic)
∞∑
n=1
Pr(En) =
∞∑
n=1
lnn
n≈∫ ∞
1
lnx
xdx =
ln2 x
2
∣∣∣∣∞
1
=∞,
which implies the event En = |Xn − µ| > ε occurs infinitelyoften. Thus, it does not converge with probability 1.
62 / 99
Probability and Random Processes
Probability Theory
Types of Convergence
Complete:∑n Pr(|Xn −X| > ε) <∞ → Almost sure:
Pr(Xn → X) = 1
in the pth mean:E[|| · ||p]
⇒
in Probability:Pr(|Xn −X| > ε)→ 0
→ in Distribution:FXn
(x)→ FX(x)
63 / 99
Probability and Random Processes
Probability Theory
Order statistics
I For a given set of random variables, X1, . . . , Xn, The orderstatistic of rank k is the kth smallest value in the data set anddenoted by X(k).
I That is,X(1) ≤ . . . ≤ X(n).
I Extreme order statistics:
X(1) = minX1, . . . , Xn, X(n) = maxX1, . . . , Xn.
I The distributions of X(k) are well-known if the Xk’s are iid.
64 / 99
Probability and Random Processes
Probability Theory
I Denote by F (x) and f(x) the cdf and pdf of X.
I The cdf of X(k):
F(k)(x) = Pr(X(k) ≤ x)
= Pr(Nx ≥ k)
=
n∑
j=k
(n
j
)F j(x)(1− F (x))n−j ,
where Nx is the number of X that are less than or equal to x:
Nx =
n∑
k=1
1(Xk ≤ x).
65 / 99
Probability and Random Processes
Probability Theory
I The CDFs of X(1) and X(n):
F(1)(x) = 1− (1− F (x))n
F(n)(x) = Fn(x)
I The pdf of X(k):
f(k)(x) =n!
(k − 1)!(n− k)!F k−1(x)(1− F (x))n−kf(x)
66 / 99
Probability and Random Processes
Probability Theory
Example: Suppose that Xk ∼ Exp(1), f(x) = e−x, x ≥ 0.
I There are n Xk’s. Find the probability that Pr(maxXk ≥ x)when x is sufficiently large.
I Sol: Since
Pr(X(n) ≤ x) = F(n)(x) = (1− e−x)n,
it can be shown that
Pr(maxXk ≥ x) = 1−(
1− 1
ex
)n
≈ 1− exp(− nex
)≈ 1−
(1− n
ex
)= ne−x
= nPr(X ≥ x).
67 / 99
Probability and Random Processes
Random Processes
Random Processes
I Continuous-time functions of random properties
I Discrete-time functions of random properties (also called timeseries in usually economics)
I In this lecture, we only focus on few random processes,namely Markov processes and wide sense stationary processes.
68 / 99
Probability and Random Processes
Random Processes
Markov processes or chains
I A random sequence (a discrete-time random process) of thefollowing property:
Pr(Xt|Xt−1, . . . , X0) = Pr(Xt|Xt−1).
I different classes of Xt ∈ X1. |X | is finite, a finite number of states2. |X | is countable infinite, e.g., X = Z3. |X | is uncountable infinite, e.g., X = R
I For the cases of 1) and 2), a transition matrix,
Pi,j = Pr(Xt = j |Xt−1 = i),
characterizes the given (time-homogeneous) Markov process.
69 / 99
Probability and Random Processes
Random Processes
A Markov chain with two states
70 / 99
Probability and Random Processes
Random Processes
I LetPni,j = Pr(Xt+m = j |Xt = i), n ≥ 0.
I The Chapman-Kolmogorov equation is
Pn+mi,j =
∑
k
Pni,kPmk,k, n,m ≥ 0.
I Let P(n) denote the matrix of n-step transition probabilities,P(n) = [Pni,j ]. Then,
P(n+m) = P(n)P(m).
I State j is said to be accessible from state i if Pni,j > 0 forsome n ≥ 0, denoted by i→ j.
71 / 99
Probability and Random Processes
Random Processes
I If i→ j and j → i, the two states (i and j) are said tocommunicate, denoted by i↔ j.
I It can be shown that
i↔ j
if i↔ j, j ↔ i
if i↔ j and k ↔ j, i↔ k
I Two states that communicate are said to be in the same class.
I Thus, any two classes in a Markov chain are either disjoint oridentical.
I If a Markov chain has one class, it is said to be irreducible.
72 / 99
Probability and Random Processes
Random Processes
I Period:d(i) = gcdn > 0 : Pni,i > 0,
where “gcd” represents the greatest common divisor.
I If d(i) = 1, state i is aperiodic.
I If i↔ j, d(i) = d(j).
I If a Markov chain is irreducible and a state is aperiodic, all thestates are aperiodic.
73 / 99
Probability and Random Processes
Random Processes
I Hitting time: the first time returns back to initial state i.
Ti = infn : Xn = i |X0 = i.
I If there is a positive probability that a process never returnsback to i starting from X0 = i or
Pr(Ti <∞) =
∞∑
n=1
Pr(Ti = n) < 1,
state i is said to be transient.
I If state i is not transient, it is recurrent.
74 / 99
Probability and Random Processes
Random Processes
I Mean recurrent time:
Mi = E[Ti] =
∞∑
n=1
nPr(Ti = n).
I If Mi is finite, state i is said to be positive recurrent;otherwise, state i is said to be null-recurrent.
75 / 99
Probability and Random Processes
Random Processes
I Suppose that X is irreducible and aperiodic.I All states are transient, or all are positive recurrent, or all are
null-recurrent.I An equilibrium (or stationary) probability distribution, π exists
if and only if all states are positive recurrent:
π = πP (πk =∑
i
πiPi,k).
I In a finite Markov chain:I at least one state is recurrent; andI all the recurrent states are positive recurrent.
I Any finite, irreducible, and aperoidic Markov chain has all thestates being positive recurrent and aperiodic (called ergodicstates), which is called a finite ergodic Markov chain.
76 / 99
Probability and Random Processes
Random Processes
I A finite ergodic Markov chain has a unique stationarydistribution.
I A Markov chain is called reversible if there exists adistribution π such as
πiPi,j = πjPj,i.
The distribution is also the stationary distribution.
77 / 99
Probability and Random Processes
Random Processes
I In a finite ergodic Markov chain,
∑
j 6=iπjPj,i =
∑
j 6=iπiPi,j ,
I This implies that the probability that a chain leaves a stateequals the probability that it enters the state.
I Proof: Since π is the stationary distribution,
πi =∑
j
πjPj,i = πi∑
j
Pi,j
I∑
j 6=i πjPj,i + πiPi,i =∑
j 6=i πiPi,j + πiPi,i
78 / 99
Probability and Random Processes
Random Processes
Examples of Markov processes:
I A state of buffer:
Qt = max0, Qt−1 +At −Dt,
where Qt is the state of buffer, At is the arrival process, andDt is the departure process.
I Autoregressive (AR) process: AR(1) process is given by
Xt = aXt−1 +Wt,
where a is the AR coefficient and Wt is the white noiseprocess.
I Birth-death process (a continuous-time Markov process)
79 / 99
Probability and Random Processes
Random Processes
Birth-death process
I With a queue with customers, we can model the variation ofthe number of customers as a birth-death process.
80 / 99
Probability and Random Processes
Random Processes
Arrival process is assumed to be a Poisson process as
Pr(a(t+ τ)− a(t) = n) =e−λτ
n!(λτ)n,
where a(t) is the accumulated arrivals. For a sufficiently short timeδ, we have
Pr(no arrival) = Pr(a(t+ δ)− a(t) = 0) = 1− λδ + o(δ)
Pr(one arrival) = Pr(a(t+ δ)− a(t) = 1) = λδ + o(δ)
Pr(more than one arrival) = Pr(a(t+ δ)− a(t) ≥ 2) = o(δ)
where limδ→0 o(δ)/δ = 0.
81 / 99
Probability and Random Processes
Random Processes
I The service time has the exponential distribution:
f(s) = µe−µs,
where µ is the service rate or Ts = 1µ is the average service
time. For a short time interval δ, we have
Pr(s > δ) = e−µδ = 1− µδ + o(δ) = 1− δ
Ts+ o(δ).
I Define the probability Pi,j as
Pi,j = PrNk+1 = j | Nk = i
where Nk represents the number of the customers in serviceat the kth instant.
82 / 99
Probability and Random Processes
Random Processes
Noting Pr(no departure) = Pr(s > δ),
P0,0 = Pr(no arrival)
= 1− λδ + o(δ)
Pi,i = Pr(no arrival) Pr(no departure) + Pr(arrival) Pr(departure)
= (1− λδ)(1− µδ) + λδµδ + o(δ)
= 1− (λ+ µ)δ + o(δ)
Pi,i+1 = Pr(arrival) Pr(no departure)
= λδ(1− µδ) + o(δ)
= λδ + o(δ)
Pi,i−1 = Pr(no arrival) Pr(departure)
= (1− λδ)(µδ) + o(δ)
= µδ + o(δ)
83 / 99
Probability and Random Processes
Random Processes
A birth-death process:
Let pn denote the probability of n customers. Then, in equilibrium,we have
pn−1Pn−1,n = pnPn,n−1 ⇒ pn−1λ = pnµ
84 / 99
Probability and Random Processes
Random Processes
Continuous-time random processes
I Example: A sine wave s(t) = sin(2πt+ θ) is a randomprocess when the phase θ is a random variable.
I A complete satistical description of a random process X(t) isknown if for any integer n and any choice of the sampleinstances (t1, t2, · · · , tn) is given by the joint pdf
fX(t1),X(t2),··· ,X(tn)(x1, x2, · · · , xn).
I A random process X(t) is a Gaussian process if for any n andall (t1, t2, · · · , tn), the random variablesX(t1), X(t2), · · · , X(tn) have a jointly Gaussian densityfunction.
85 / 99
Probability and Random Processes
Random Processes
I A random process X(t) is a wide sense stationary (WSS)process if
E[X(t)] = µ (independent of time t)
andE[X(t)X(t+ τ)] = RX(τ),
and RX(τ) is called the autocorrelation function of X(t).I The spectral density is defined as
SX(f) =
∫ ∞
−∞RX(τ)e−j2πfτdτ.
I A process X(t) is called a white process if it has a flatspectral density, i.e., SX(f) = C for all f .⇒ Then, its autocorrelation is
RX(τ) = Cδ(τ).
86 / 99
Probability and Random Processes
Random Processes
In most cases, we only deal with some simple random processes (orrandom signals) in communications. Some examples are as follows:
I White Gaussian noise, n(t): it is WSS and has the spectrumas
SN (f) =N0
2∀f,
where N0/2 is called “double-sided” spectral density.
I Random phase sinusoid: the random signal is written as
s(t) = sin(2πft+ θ),
where θ is a uniform r.v. over [0, 2π).
87 / 99
Probability and Random Processes
Random Processes
White noise
0 100 200 300 400 500 600 700 800 900 1000−4
−3
−2
−1
0
1
2
3
4
88 / 99
Probability and Random Processes
Random Processes
Sine waves with random phases
0 1 2 3 4 5 6 7 8 9 10−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
89 / 99
Probability and Random Processes
Random Processes
Q) Is s(t) = sin(2πft+ θ) WSS?A) We need to verify
1. the mean (it should be a constant over the time),
2. the autocorrelation (it should only be a function of the timedifference).
The mean is
E[sin(2πft+ θ)] =
∫ 2π
0sin(2πft+ θ)
1
2πdθ = 0.
90 / 99
Probability and Random Processes
Random Processes
The autocorrelation is
E[sin(2πft+ θ) sin(2πf(t+ τ) + θ)]
=
∫ 2π
0sin(2πft+ θ) sin(2πf(t+ τ) + θ)
1
2πdθ
=1
2π
∫ 2π
0
1
2(cos(2πfτ)− cos(4πft+ 2πfτ + 2θ)) dθ
=1
2cos(2πfτ).
⇒ E[s(t)s(t+ τ)] = RS(τ) =1
2cos(2πfτ).
This shows that sin(2πft+ θ) is WSS.
91 / 99
Probability and Random Processes
Random Processes
Filtering of stochastic signalsLet h(t) be the impulse response of a linear system. The inputsignal to the linear system is a stochastic process, X(t). Then, theoutput is written as
Y (t) = h(t) ∗X(t)
=
∫
τh(τ)X(t− τ)dτ
I Mean:E[Y (t)] = h(t) ∗ E[X(t)]
I Autocorrelation functions:
RY (τ) = RX(τ) ∗ h(τ) ∗ h(−τ)
I Power spectra:
SY (f) = |H(f)|2SX(f).
92 / 99
Probability and Random Processes
Random Processes
Proofs
I E[Y (t)] = h(t) ∗ E[X(t)]:
E[Y (t)] = E[
∫h(t− τ)X(τ)dτ ]
=
∫h(t− τ)E[X(τ)]dτ
=
∫h(t− τ)µ(τ)dτ
= h(t) ∗ E[X(t)].
93 / 99
Probability and Random Processes
Random Processes
i) RY (τ) = RX(τ) ∗ h(τ) ∗ h(−τ):
RY (τ) = E[Y (t)Y (t− τ)]
= E[
(∫h(u)X(t− u)du
)(∫h(v)X(t− τ − v)dv
)]
=
∫ ∫h(u)h(v)E[X(t− u)X(t− τ − v)]dudv
=
∫ ∫h(u)h(v)RX(τ + v − u)dudv
=
∫[h(τ + v) ∗RX(τ + v)]h(v)dv
Let RY X(τ) = h(τ) ∗RX(τ). Then,
RY (τ) =
∫RY X(τ + v)h(v)dv = h(τ) ∗RY X(−τ)
= h(τ) ∗ h(−τ) ∗RX(−τ) = h(τ) ∗ h(−τ) ∗RX(τ).
ii) SY (f) = |H(f)|2SX(f):Directly obtained from the above. 94 / 99
Probability and Random Processes
Random Processes
A rectangular pulse filter and the output spectrum
Filtered output
Rectangular pulse
White signal
0 100 200 300 400 500 600 700 800 900 1000−4
−3
−2
−1
0
1
2
3
4
0 1 2 3 4 5 6 7 8 9 10x 104
0
1000
2000
3000
4000
5000
6000
7000
white signals (time domain) the output spectrum (freq. domain)
95 / 99
Probability and Random Processes
Random Processes
Bandlimited process and samplingLet X(t) be a stationary bandlimited process, i.e., SX(f) = 0 for|f | ≥W . Then, the following relation holds (the mean-squarestochastic convergence).
E|X(t)−∞∑
n=−∞X(kTs)sinc(2W (t− kTs))|2 = 0,
where Ts = 1/2W .Note: E[|X − S|2] = 0 means the X and S are the same in themean-square error sense.
96 / 99
Probability and Random Processes
Random Processes
Ergodic processes
I A strictly stationary (SS) process is a process in which for alln, all (t1, t2, · · · , tn), and all ∆
fX(t1),X(t2),··· ,X(tn)(x1, x2, · · · , xn)= fX(t1+∆),X(t2+∆),··· ,X(tn+∆)(x1, x2, · · · , xn)
A process is called M th-order stationary if the abovecondition holds for all n ≤M .
I A SS process is a WSS process.
97 / 99
Probability and Random Processes
Random Processes
Two types of averages for a SS process X(t):
I Ensemble average
E[g(X(t0))] =
∫g(x)fX(t0)(x)dx.
Note: E[g(X(t0))] should be independent of t0.
I Time average
< g(X(t0)) >i= limT→∞
1
T
∫ T/2
−T/2g(x(t;wi))dt.
98 / 99
Probability and Random Processes
Random Processes
Ergodic: If it happens that for all functions g(x), < g(X(t0)) >iis independent of i and equal to E[g(X(t0))], then the process iscalled ergodic.
I Power spectrum of the ergodic process
SX(f) = E[
limT→∞
|XT (f)|2
T
]= lim
T→∞E[|XT (f)|2]
T,
where XT (f) =∫ T/2−T/2X(t)e−j2πftdt.
99 / 99