Law of large numbers. Central limit theorem. Confidence interval.

1

•Law of large numbers.

•Central limit theorem.

•Confidence interval.

•Hypothesis testing. Types of errors.

Lecture 9.

2

Some practice with the Normal distribution

(please open Lect9_ClassPractice)

),( N

1. Math aptitude scores are distributed as N(500,100).

Find: a. the probability that the individual score exceeds 600 andb. the probability that the individual score exceeds 600 given that it exceeds 500

2. What mathematics SAT score, can be expect to occur with the

probability 0.90 or greater ?

3

The (weak) law of large numbers

(The following result is also known as a Bernoulli theorem).

Let X1 , X2 ..., Xn be a sequence of independent and identically distributed (iid) random variables, each having a mean EXi = and standard deviation Define a new variable, the “sample mean”

...1 2 (9.1)

X X XnXn n

One should keep in mind that is also a random variable : it randomly changes between different samples of the same size n, and those changes are sensitive to the size of the sample, n:

Example (Lect9_LargeNumbers.nb).

nX

4

Using the properties of expectations it is easy to prove that

1( ... ) (9.2)1 2

EX EX EX EXn nn

Then, using the Chebyshev’ inequality, (we use it without prove) one can find

Using the properties of the variances (see Lect. 8) we can also prove that

1 21 22

219 31 22

1( ) (

{ ( ) ( ) ... ( )} ( . )

...) ( ... )nVar X Varn n

Var X Var X Var Xnnn

X X XVar X X X

n n

http://mathworld.wolfram.com/ChebyshevInequality.html

http://mathworld.wolfram.com/ChebyshevInequality.html

5

2 2

2( )(| | ) 0 as (9.4)Var XnP X nnn

where is any (but typically small) positive number. Let us discuss this result. The left side contains the probability that the deviation between the sample average, and the mean value of the individual random variable,

, exceeds . The right hand side shows that this probability decreases as

1/n for large n.

Thus, one can say that “ converges to in probability”.

Xn

(Weak) Law of Large Numbers.Suppose X1 ,X2 ...,Xn are i.i.d. with finite E(Xi) = . Then, as (9.5), converges to in probability.

n Xn

Xn

6

(9.5) could be called the fundamental theorem of statistics because it says that the sample mean is close to the mean of the underlying population when the sample is large.

It implies that we do not have to measure the weights of all American 30 year old males in order to find their average weight.We can use the weights of 1000 individuals chosen at random.

Later on we will discuss how close the mean on the sample of this size will be to the mean of the underlying population.

7

Dice average

8

Coins- Average (relative to ½)

9

Coins average (2)

10

Experiments with dice and coins show that the rate of convergence to the average value is quite slow.

Using (9.4) we can understand why it is so.

Suppose that we flip a coin n = 1000 times and let Xi be 1 if the i-th toss is Heads and 0 otherwise. The Xi are Bernoulli random variables with p=P(Xi=1)=0.5. Exi=p=0.5.

VarXi = (1-p)2p+ p2(1-p)=(1-p)p(1-p+p)=p(1-p)=1/4.

Taking =0.05 (which is the 10 % of the expected value), we find:

We can see that the probability of a comparatively large fluctuation (10 % of the mean value) is not too small. It explains the large irregularities that we can find in the pictures.

1.03102)05.0(

4/1)05.0|2/1(| nXP

11

Central Limit Theorem.

The Law of Large Numbers indicated that the sample average converges in probability to the population average. Now our concern are the probability distributions. Suppose that we know a distribution for i.i.d. Xi. We wonder how the distribution of the sum of Xi is related to the individual distributions.

Reiteration: assuming we know pdf(Xi), what can we tell about the pdf( )

for large n.

Central Limit Theorem(First formulation)

Suppose X1 ,X2 ...,Xn are i.i.d. with E(Xi)= and finite Var(Xi)= 2. Then, as n

( ) ( )/

XnP x P xn

where denotes a random variable with the standard normal distribution.

(9.5)

nX

12

Another formulation (which is essentially the same, but provides some additional insight)

Let Sn =X1 +,X2+...,+Xn be the sum of n discrete i.i.d. with E(Xi)= and

Var(Xi)= 2. Then, for any x

We already know from previous that

is the probability function for the continuous distribution (see also “ cumulative distribution function”)

Let us now compare the equations (9.6) and (A). *Please notice that z in these equations is a “silent” variable. Its name can be changed by any other letter and does not affect the outcome of the integration. In my experience this naming issue often

causes confusion

*)6.9(]2/[2

1)( 2 dzzExpx

n

nSP

xn

*)()()( AdzzpdfxXPx

13

P(X < x) = Integrate[pdf[ z ],{ z, -Inf, x}] (A)

n

nSn

Now we are well equipped for the final step.

)6.9(]2/[2

1)( 2 dzzExpx

n

nSP

xn

Comparing (9.6) and (A) we discover that the

distribution function for the (standard) random variable

asymptotically approaches the standard normal distribution.

Comment: Sn can be dimensional variable: height, weight, gas pressure, etc.Try proving that is always dimensionless.

n

nSn

14

Example 1: Suppose we flip a coin 900 times. What is the probability to get at least 465 heads?

Examples

1 2 22 2

( ) [ / ]S nnP a Exp x dx

an

20 5 1 0 25

1900 450 15

2465 450 15

465 1 1 0 15915 15

. , ( ) . ;

* ,

; ( ) .nn

p p

n n

S nS leads to P

n

Check it!

15

16

As a matter of fact. CLT indicates that such an assumption is not completely unreasonable.

According to CLT, “the sums of independent random variables tend to look normal no matter what crazy distribution the individual variables have” (G&S,p.362).

17

Question for practice (G&S,9.3.7)

Extra-credit

Choose independently 25 numbers from the interval [0,1] with the probability density f(x) given below and compute their sum s25. Repeat this experiment 100 times and make a bar graph of the result. How well does the normal density fit your bar graph in each case?

(a) f(x)=1

(b) f(x) = 3 x2

(c) f(x) = 2 - 4|x-1/2|

18

CLT in statistics

19

Confidence Interval

In statistics we always deal with the limited samples of population. Usually, the goal is to draw inferences about a population from a sample. This approach is called “Inferential statistics”.

Suppose that we are interested in the mean number of words that can be remembered by a high school student.

First, we have to select a random sample from the population. Suppose that the group of 15 students is chosen.

We can use the sample mean as an unbiased estimate of , the population mean value. However, given the small size of the sample, it will certainly not be a perfect estimate. By chance it is bound to be at least either a little bit too high or a little bit too low.

http://davidmlane.com/hyperstat/A10626.html


20

For the estimate of μ to be of value, one must have some idea of how precise it is. That is, how close to μ is the estimate likely to be?

The CLT helps answering this question. It tells us that the sample mean is a random variable normally distributed near the population mean.This can help us to derive a probabilistic estimate of μ based on the sample mean.

An excellent way to specify the precision is to construct a confidence interval.

4.1 Confidence interval, is known

Let’s assume for simplicity that we have to estimate μ while is known (this case is not very realistic but can give us an idea).


21

Noticing now that the standardized variable is

It can be shown (check it with Mathematica) that for the standard normal distribution

( 2 2)~0.95P

/Xn

n

we have

)~0.952 2( Xnn nP

*More precisely, the 95% range is (-1.96, 1.96), but using (-2,2) is good enough for all practical purposes

*

22

In other words, with the probability ~0.95, belongs to the interval

We say that this is a 95% confidence interval for

]2,2[nnX

nnX

23

In this case 2 6/10 0.6n

As a result,

66.32 0.6

Example 1.

Suppose that we have a random sample of size 100 from a population with standard deviation 3 and we observe a sample mean . What can we tell about μ?66.32

100X

24

Example 2.

How large a sample must be selected from a normal distribution with standard deviation 12 in order to estimateto within 2 units

.14422 nn

25

In the previous example it was assumed that the Variance is known. It is much more common that neither mean nor the variance is known when the sampling is done. As a rough estimate, one can use the sample variance:

2 2 22 1 2( ) ( ) .... ( )nx x x

n

This will lead to a normalized variable

nn

S nT

n

Which is a function of two random variables.

26

It was shown by W.S. Gosset that the density function for Tn is not normal but rather a t-density with n degrees of freedom. For large n t-density approaches the normal distribution. In general, t-density closely related the

This is called the Chi-squared distribution with n degrees of freedom.

Chi-squared distribution

( / 2) ( / 2) 1

( ) , 02 ( / 2)

x r

ne x

f x xr

10( / 2) ; ( / 2) ( 1)! if is a positive integer.x rr e x dx r r r

27

This distribution describes the sum of squares of n independent random variables.

It is very important for comparing experimental data with a theoretical discrete distribution to see whether the data support the theoretical model.

Please read Chapter 7.2 of G&S for details. Here we consider only simple examples of hypothesis testing.

28

We consider now two types of hypothesis testing:

Testing the mean and testing the difference between two means

Hypothesis testing

29

Example 1. Suppose we run a casino and we wonder if our roulette wheel is biased. To rephrase our question in statistical terms, let p be the probability red comes up and introduce two hypothesis

H0: p = 18/38 null hypothesis

H1: p 18/38 alternative hypothesis

To test to see if the null hypothesis is true, we spin the roulette n times and let Xi=1 if red comes up on the i-th trial and 0 otherwise, so that (the observed mean value) is the fraction of times red came up in the first n trials.

nX

30

How to decide if H0 is correct?

It is reasonable to suggest that large deviations of from the “fair” value 18/38 would indicate that H0 failed. But how to decide which deviations should be considered “large”?

The concept of the “confidence interval” can help answering this question. As we know, with probability ~95%

should belong to the confidence interval

Xn

}/23818,/2

3818{ nn }/2,/2{ nn

Xn

Reminder: n is the sampling size or the number of the iid (independent and identically distributed) variables added together, and are their individual mean and standard deviation values.

31

Thus, we can say that if Xn falls outside the chosen interval,

In this test we have set the type I error to be 5%

then the H0 can be rejected with possible error less then 5%.

Let us find now the individual standard deviation. The variable Xi takes only two values: 0 and 1. Probability(1)=18/38, Probability(0)= 20/38. Using this, we can find

<Xi2>= 1*18/38+0*20/38=18/38. (I use <…> notation for

the mean.)

Rejecting H0 when it is true is called a type I error.

Accepting H0 when the alternative hypothesis is true is called a type II error

32

3820

38182)

3818(

3818222 iXiX

5.0~3820

3818

Thus, 2 σ ~ 1 and the test can now be formulated as:

18 1reject H if | |0 38

Xn n

or in terms of the total number of reds Sn = S1 +S2+ … Sn,

18reject H if | |0 38

nS nn

33

Suppose that we spin the wheel 3800 times and get red 1869 times. Is the wheel biased? The expected number of reds is (18/38)n=1800. Is the deviation from this value

| Sn – 1800| = 69

indicates that the wheel is biased? Given the large numbers of trials, this deviation might not seem large.

However, it must be compared with n1/2 = (3800)1/2 ~ 61.6

Given that 69 > 61.6 we have to reject H0 with a notion that if H0 were correct then we would see and observation this far from the mean less than 5% of the time.

Obviously, it does not prove the wheel is biased but shows that it is very likely to be true.

34

Example 2.

Do married college students with children do less well because they have less time to study or do they do better because they are more serious?

The average GPA for all students was 2.48. To answer the question, we formulate two hypothesis:

H0: = 2.48 null hypothesis

H1: 2.48 alternative hypothesis

35

The records of 25 married college students with children indicate that their GPA average was 2.36 with the standard deviation of 0.5.

Using the last example, we see that to derive a conclusion with type I error of 5% we should reject H0 if

2.025

5.022|48.2| niX

We can see that the observed deviation |Xi -2.48|=0.12

< 0.2. It means that we can not reject the 0-hypothesis. In other words, we can not be 95% certain that married students with children do either worth or better that the students without children.

36

Testing the difference of two means

Suppose that we want to compare the test results for two independent random samples of size n1 and n2 taken from two different populations. A typical example would be testing a drug, one sample taken from the population taking the drug and another – from the reference group.

Suppose now that the population means are unknown.

For example, we do not know what % of population would be infected by a certain disease if the drug is not taken.

Notice that in the previous case we were able to calculate the mean for the 0-hypothesis. Now we can not.

37

H0 : - null hypothesis21

H1 = - alternative hypothesis21

From the CLT:

);2

/22

,2

(2

);1

/21

,1

(1

nNX

nNX

If H0 correct, then

);2

/221

/21

,0(21

nnNXX

-

38

Based on the last result, if we want a test with a type 1 error of 5% then we should reject H0 if

Example: study of passive smoking reported in NEJM.

The size of lung airways was taken for 200 female nonsmokers who were in a smoky environment and for 200 who were not. For the 1-st group the average was 2.72 and sigma= 0.71, while for the second group the corresponding values were 3.17 and 0.74.

2

2

21

2

121 //2|| nnXX

39

14503.0200/274.0200/271.022

/221

/21

2 nn

while | | 0.45 ( noticably exceeds 0.145 )1 2X X

Based on these data we find

This means that the conclusions are convincing (should be taken seriously)

Date post:	30-Dec-2015
Category:	Documents
Upload:	daniel-sanders
View:	26 times
Download:	0 times

Law of large numbers. Central limit theorem. Confidence interval.

Documents