of 25
8/4/2019 Blackboard CLT
1/25
The Central Limit Theorem
Mitja Stadje
September, 2011
Mitja Stadje, QM Additional Material
http://find/http://goback/8/4/2019 Blackboard CLT
2/25
Central Limit Theorem, CLT
X1,X2, . . . i.i.d. with EXi = and
0 < VarXi = 2
8/4/2019 Blackboard CLT
3/25
CLT, Application I: Calculating probabilities of sums
In case the distribution is known, (in particular, the true mean and the true variance are known), the CLT can be used to
compute probabilities involving the sum of i.i.d. observations: Forinstance suppose that Xi are i.i.d. with exponential, ort-distribution, or F-distribution or any other standard distribution.Then to compute P[X1 + X2 + . . . + X100 5] we need to know
the cdf ofX
1 +X
2 + . . . +X
100, i.e.,P[X1 + X2 + . . . + Xn 5] = FX1+X2+...+X100(5)
The problem is that FX1+X2+...+X100 often is very complicated tocompute. However, using the CLT we know that
100i=1 X
i N(100, 1002
). Therefore
P[100i=1
Xi 5] P[100
i=1 Xi 10010
5 10010
]
= (5 100
10).
Mitja Stadje, QM Additional Material
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
4/25
CLT, Application I: Calculating probabilities of sums
In case the distribution is known, (in particular, the true mean and the true variance are known), the CLT can be used to
compute probabilities involving the sum of i.i.d. observations: Forinstance suppose that Xi are i.i.d. with exponential, ort-distribution, or F-distribution or any other standard distribution.Then to compute P[X1 + X2 + . . . + X100 5] we need to know
the cdf ofX
1 +X
2 + . . . +X
100, i.e.,P[X1 + X2 + . . . + Xn 5] = FX1+X2+...+X100(5)
The problem is that FX1+X2+...+X100 often is very complicated tocompute. However, using the CLT we know that
100i=1 X
i N(100, 1002
). Therefore
P[100i=1
Xi 5] P[100
i=1 Xi 10010
5 10010
]
= (5 100
10).
Mitja Stadje, QM Additional Material
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
5/25
CLT, Application I: Calculating probabilities of sums
In case the distribution is known, (in particular, the true mean and the true variance are known), the CLT can be used to
compute probabilities involving the sum of i.i.d. observations: Forinstance suppose that Xi are i.i.d. with exponential, ort-distribution, or F-distribution or any other standard distribution.Then to compute P[X1 + X2 + . . . + X100 5] we need to knowthe cdf of X
1+ X
2+ . . . + X
100, i.e.,
P[X1 + X2 + . . . + Xn 5] = FX1+X2+...+X100(5)The problem is that FX1+X2+...+X100 often is very complicated tocompute. However, using the CLT we know that
100i=1
Xi N
(100, 100
2
). Therefore
P[100i=1
Xi 5] P[100
i=1 Xi 10010
5 10010
]
= (5 100
10).
Mitja Stadje, QM Additional Material
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
6/25
CLT, Application II: Confidence intervals for
In case that is not known and we have i.i.d. observationsX1, . . . ,Xn we can always estimate with Xn. However, thequestion is how good is this estimate? To answer this question onetypically gives a whole interval around Xn where with probabilitysay = 95% should be in. This is called a confidence interval.Confidence intervals:
Give a better idea about the possible values of compared tojust a point estimator Xn.
Can be used to evaluate the quality of the estimate. Larger
intervals correspond to bad estimates and smaller intervalscorrespond to good estimates.
Mitja Stadje, QM Additional Material
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
7/25
CLT, Application II: Confidence intervals for if is
unknown
Since1 = P
z/2 Z z/2
for Z N(0, 1), by the CLT also
1
= P
z/2
n(Xn )
z/2. (1)
Solvingn(Xn)
z/2 for we get Xn + z/2/
n.
Solvingn(Xn)
z/2 we get Xn z/2/
n. Therefore,
1 = PXn
z/2
n
Xn +z/2
n .
So is with 1 probability in the intervalXn
z/2n
, Xn +z/2
n
.
In case is not known is typically replaced by S2.Mitja Stadje, QM Additional Material
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
8/25
CLT, Application II: Confidence intervals for if is
unknown
Since1 = P
z/2 Z z/2
for Z N(0, 1), by the CLT also
1
= P
z/2
n(Xn )
z/2. (1)
Solvingn(Xn)
z/2 for we get Xn + z/2/
n.
Solvingn(Xn)
z/2 we get Xn z/2/
n. Therefore,
1 = PXn
z/2
n
Xn +z/2
n .
So is with 1 probability in the intervalXn
z/2n
, Xn +z/2
n
.
In case is not known is typically replaced by S2
.Mitja Stadje, QM Additional Material
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
9/25
CLT, Application II: Confidence intervals for if is
unknown
Since1 = P
z/2 Z z/2
for Z N(0, 1), by the CLT also
1
= P
z/2
n(Xn )
z/2. (1)
Solvingn(Xn)
z/2 for we get Xn + z/2/
n.
Solvingn(Xn)
z/2 we get Xn z/2/
n. Therefore,
1 = PXn
z/2
n
Xn +z/2
n .
So is with 1 probability in the intervalXn
z/2n
, Xn +z/2
n
.
In case is not known is typically replaced by S2
.Mitja Stadje, QM Additional Material
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
10/25
CLT, Application III: Hypothesis testing
Suppose that is unknown but is known. We are given aconstant 0 (for instance 0 = 0) and we want to test if the true
expectation of our observation is equal to 0. That is we want totest the hypothesis
H0 : = 0 against H1 : = 0.Strategy: Assume for a moment that H0 were true. The CLT yields
T =n(Xn 0)
Z N(0, 1)
So if H0 is true with probability 1 1
= P[
z/2
T
z/2].
Typically is chosen such that 1 = 99% or 95%. Therefore, ifwe observe in my outcomes that T / [z/2, z/2], it seemsunlikely that the hypothesis H0 is true. (If it were true then with95% probability T [z/2, z/2] which was not the case). So inthis case we reject H0.
Mitja Stadje, QM Additional Material
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
11/25
CLT, Application III: Hypothesis testing
Suppose that is unknown but is known. We are given aconstant 0 (for instance 0 = 0) and we want to test if the true
expectation of our observation is equal to 0. That is we want totest the hypothesis
H0 : = 0 against H1 : = 0.Strategy: Assume for a moment that H0 were true. The CLT yields
T =n(Xn 0)
Z N(0, 1)
So if H0 is true with probability 1 1
= P[
z/2
T
z/2].
Typically is chosen such that 1 = 99% or 95%. Therefore, ifwe observe in my outcomes that T / [z/2, z/2], it seemsunlikely that the hypothesis H0 is true. (If it were true then with95% probability T [z/2, z/2] which was not the case). So inthis case we reject H0.
Mitja Stadje, QM Additional Material
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
12/25
CLT, Application III: Hypothesis testing
Suppose that is unknown but is known. We are given aconstant 0 (for instance 0 = 0) and we want to test if the true
expectation of our observation is equal to 0. That is we want totest the hypothesis
H0 : = 0 against H1 : = 0.Strategy: Assume for a moment that H0 were true. The CLT yields
T =n(Xn 0)
Z N(0, 1)
So if H0 is true with probability 1 1
= P[
z/2
T
z/2].
Typically is chosen such that 1 = 99% or 95%. Therefore, ifwe observe in my outcomes that T / [z/2, z/2], it seemsunlikely that the hypothesis H0 is true. (If it were true then with95% probability T [z/2, z/2] which was not the case). So inthis case we reject H0.
Mitja Stadje, QM Additional Material
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
13/25
CLT, Application III: Hypothesis testing
So the CLT can be used to test hypothesis H0. If H0 is rejected we
can conclude that there is evidence supporting H1. (The otherdirection does not hold.)The region [z/2, z/2] (which under H0 has 95% probability) isalso called the acceptance region. The region R \ [z/2, z/2] iscalled the rejection region of the test. is called the significance
of the test.
(a) In the setting above what do you do if you do not know themean and do not know the variance?
(b) How to test the hypothesis H0 :
0?
(c) Above we only assumed that the observations X1, . . . ,Xn arei.i.d. What happens if we do not know the variance but weadditionally know that our observations are normal? Could wethen do even better?
Mitja Stadje, QM Additional Material
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
14/25
CLT, Application III: Hypothesis testing
So the CLT can be used to test hypothesis H0. If H0 is rejected we
can conclude that there is evidence supporting H1. (The otherdirection does not hold.)The region [z/2, z/2] (which under H0 has 95% probability) isalso called the acceptance region. The region R \ [z/2, z/2] iscalled the rejection region of the test. is called the significance
of the test.
(a) In the setting above what do you do if you do not know themean and do not know the variance?
(b) How to test the hypothesis H0 :
0?
(c) Above we only assumed that the observations X1, . . . ,Xn arei.i.d. What happens if we do not know the variance but weadditionally know that our observations are normal? Could wethen do even better?
Mitja Stadje, QM Additional Material
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
15/25
CLT, Application III: Hypothesis testing
So the CLT can be used to test hypothesis H0. If H0 is rejected we
can conclude that there is evidence supporting H1. (The otherdirection does not hold.)The region [z/2, z/2] (which under H0 has 95% probability) isalso called the acceptance region. The region R \ [z/2, z/2] iscalled the rejection region of the test. is called the significance
of the test.
(a) In the setting above what do you do if you do not know themean and do not know the variance?
(b) How to test the hypothesis H0 :
0?
(c) Above we only assumed that the observations X1, . . . ,Xn arei.i.d. What happens if we do not know the variance but weadditionally know that our observations are normal? Could wethen do even better?
Mitja Stadje, QM Additional Material
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
16/25
CLT, Application III: Hypothesis testing
So the CLT can be used to test hypothesis H0. If H0 is rejected we
can conclude that there is evidence supporting H1. (The otherdirection does not hold.)The region [z/2, z/2] (which under H0 has 95% probability) isalso called the acceptance region. The region R \ [z/2, z/2] iscalled the rejection region of the test. is called the significance
of the test.
(a) In the setting above what do you do if you do not know themean and do not know the variance?
(b) How to test the hypothesis H0 :
0?
(c) Above we only assumed that the observations X1, . . . ,Xn arei.i.d. What happens if we do not know the variance but weadditionally know that our observations are normal? Could wethen do even better?
Mitja Stadje, QM Additional Material
CLT A li i III H h i i
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
17/25
CLT, Application III: Hypothesis testing
So the CLT can be used to test hypothesis H0. If H0 is rejected we
can conclude that there is evidence supporting H1. (The otherdirection does not hold.)The region [z/2, z/2] (which under H0 has 95% probability) isalso called the acceptance region. The region R \ [z/2, z/2] iscalled the rejection region of the test. is called the significance
of the test.
(a) In the setting above what do you do if you do not know themean and do not know the variance?
(b) How to test the hypothesis H0 : 0?(c) Above we only assumed that the observations X1, . . . ,Xn are
i.i.d. What happens if we do not know the variance but weadditionally know that our observations are normal? Could wethen do even better?
Mitja Stadje, QM Additional Material
CLT A li i III H h i i
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
18/25
CLT, Application III: Hypothesis testing
So the CLT can be used to test hypothesis H0. If H0 is rejected we
can conclude that there is evidence supporting H1. (The otherdirection does not hold.)The region [z/2, z/2] (which under H0 has 95% probability) isalso called the acceptance region. The region R \ [z/2, z/2] iscalled the rejection region of the test. is called the significance
of the test.
(a) In the setting above what do you do if you do not know themean and do not know the variance?
(b) How to test the hypothesis H0 : 0?(c) Above we only assumed that the observations X1, . . . ,Xn are
i.i.d. What happens if we do not know the variance but weadditionally know that our observations are normal? Could wethen do even better?
Mitja Stadje, QM Additional Material
E i
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
19/25
Excursion
To prove the CLT we will need moment generating functions:
Definition
Suppose that X is a random variable. The moment generatingfunction of X is defined as
MX(t) = E[exp(tX)].
Why the name? M(t) = E[X exp(tX)], so M(0) = E[X].M(t) = E[X2 exp(tX)], so M(0) = E[X2]; and so on.Example: If X N(0, 1) then
MX(t) = E[exp(tX)]
=
exp(tx)fX(x)dx
=
exp(tx)12
exp(x2/2)dx
Mitja Stadje, QM Additional Material
E i
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
20/25
Excursion
To prove the CLT we will need moment generating functions:
Definition
Suppose that X is a random variable. The moment generatingfunction of X is defined as
MX(t) = E[exp(tX)].
Why the name? M(t) = E[X exp(tX)], so M(0) = E[X].M(t) = E[X2 exp(tX)], so M(0) = E[X2]; and so on.Example: If X N(0, 1) then
MX(t) = E[exp(tX)]
=
exp(tx)fX(x)dx
=
exp(tx)12
exp(x2/2)dx
Mitja Stadje, QM Additional Material
E i
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
21/25
Excursion
To prove the CLT we will need moment generating functions:
Definition
Suppose that X is a random variable. The moment generatingfunction of X is defined as
MX(t) = E[exp(tX)].
Why the name? M(t) = E[X exp(tX)], so M(0) = E[X].M(t) = E[X2 exp(tX)], so M(0) = E[X2]; and so on.Example: If X N(0, 1) then
MX(t) = E[exp(tX)]
=
exp(tx)fX(x)dx
=
exp(tx)12
exp(x2/2)dx
Mitja Stadje, QM Additional Material
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
22/25
MX(t) =
12
exp (x2 2tx)/2
dx
=
12
exp (x2 2tx + t2 t2)/2dx
=
12
exp ((x t)2 t2)/2
dx
= et2/2
1
2 exp (x t)2
/2
dx = et2/2
Other properties: If X and Y are independent then
MX+Y(t) = MX(t)MY(t).
Why? Therefore, if Xi is a random sample
MX1+X2+...+Xn(t) = MX1 (t) . . .MXn(t) =
MXi(t)n.
Mitja Stadje, QM Additional Material
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
23/25
MX(t) =
12
exp (x2 2tx)/2
dx
=
12
exp (x2 2tx + t2 t2)/2dx
=
12
exp ((x t)2 t2)/2
dx
= et2/2
1
2 exp (x t)2
/2
dx = et2/2
Other properties: If X and Y are independent then
MX+Y(t) = MX(t)MY(t).
Why? Therefore, if Xi is a random sample
MX1+X2+...+Xn(t) = MX1 (t) . . .MXn(t) =
MXi(t)n.
Mitja Stadje, QM Additional Material
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
24/25
MX(t) =
12
exp (x2 2tx)/2
dx
=
12 exp (x2 2tx + t2 t2)/2dx
=
12
exp ((x t)2 t2)/2
dx
= et2/2
1
2 exp (x t)2
/2
dx = et2/2
Other properties: If X and Y are independent then
MX+Y(t) = MX(t)MY(t).
Why? Therefore, if Xi is a random sample
MX1+X2+...+Xn(t) = MX1 (t) . . .MXn(t) =
MXi(t)n.
Mitja Stadje, QM Additional Material
http://goforward/http://find/http://goback/8/4/2019 Blackboard CLT
25/25
We will need the following result:
Theorem
Let Y1,Y2, . . . be a sequence of random variables with distributionfunctions F1,F2, . . . and mgf -s M1,M2, . . . . If the random variable
Y has distribution function F and mgf M, thenlimn Mn(t) = M(t) for t (h, h), for some h > 0, impliesYn
d Y.
Mitja Stadje, QM Additional Material
http://goforward/http://find/http://goback/