+ All Categories
Home > Documents > Sampling and estimation Petter Mostad 2005.09.26.

Sampling and estimation Petter Mostad 2005.09.26.

Date post: 18-Jan-2016
Category:
Upload: laurence-day
View: 225 times
Download: 2 times
Share this document with a friend
30
Sampling and estimation Petter Mostad 2005.09.26
Transcript
Page 1: Sampling and estimation Petter Mostad 2005.09.26.

Sampling and estimation

Petter Mostad

2005.09.26

Page 2: Sampling and estimation Petter Mostad 2005.09.26.

The normal distribution

• The most used continuous probability distribution: – Many observations tend to approximately

follow this distribution– It is easy and nice to do computations with– BUT: Using it can result in wrong conclusions

when it is not appropriate

Page 3: Sampling and estimation Petter Mostad 2005.09.26.

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

The mean μ

μ-2σ μ+2σ

Page 4: Sampling and estimation Petter Mostad 2005.09.26.

The normal distribution

• The probability density function is

• where

• Notation

• Standard normal distribution

• Using the normal density is often OK unless the actual distribution is very skewed

2 2( ) / 2

2

1( )

2

xf x e

( )E X 2( )Var X 2( , )N

(0,1)N

Page 5: Sampling and estimation Petter Mostad 2005.09.26.

Normal probability plots

• Plotting the quantiles of the data versus the quantiles of the distribution.

• If the data is approximately normally distributed, the plot will approximately show a straight line -200 0 200 400 600 800 1 000 1 200

Observed Value

-3

-2

-1

0

1

2

3

4

Exp

ecte

d N

orm

al

Normal Q-Q Plot of Household income in thousands

Page 6: Sampling and estimation Petter Mostad 2005.09.26.

The Normal versus the Binomial distribution

• When n is large and π is not too close to 0 or 1, then the Binomial distribution becomes very similar to the Normal distribution with the same expectation and variance.

• This is a phenomenon that happens for all distributions that can be seen as a sum of independent observations.

• It can be used to make approximative computations for the Binomial distribution.

Page 7: Sampling and estimation Petter Mostad 2005.09.26.

The Exponential distribution

• The exponential distribution is a distribution for positive numbers (parameter λ):

• It can be used to model the time until an event, when events arrive randomly at a constant rate

( ) tf t e

( ) 1/E T 2( ) 1/Var T

Page 8: Sampling and estimation Petter Mostad 2005.09.26.

Sampling

• We need to start connecting the probability models we have introduced with the data we want to analyze.

• We (usually) want to regard our data as a simple random sample from a probability model: – Each is sampled independently from the other

– Each is sampled from the probability model

• Thus we go on to study the properties of simple random samples.

Page 9: Sampling and estimation Petter Mostad 2005.09.26.

Example: The mean of a random sample

• If X1,X2,…,Xn is a random sample, then their sample mean is defined as

• As it is a function of random variables, it is a random variable.

• If E(Xi)=μ, then

• If Var(Xi)=σ2, then

1

1 n

ii

X Xn

( )E X 2

( )Var Xn

Page 10: Sampling and estimation Petter Mostad 2005.09.26.

Example

• Assume X1,X2,…,X10 is a random sample from the binomial distribution Bin(20,0.2)

• We get

0 5 10 15 20

0.00

0.05

0.10

0.15

0.20

( ) 20 0.2 4iE X ( ) 20 0.2 (1 0.2) 3.2iVar X

( ) 4E X ( ) 3.2 /10 0.32Var X

Page 11: Sampling and estimation Petter Mostad 2005.09.26.

Simulation

• Simulation: To generate outcomes by computer, on the basis of pseudo-random numbers

• Pseudo-random number: Generated by an algorithm completely unrelated to the way numers are used, so they appear random. Usually generated to be uniformly distributed between 0 and 1.

• There is a correspondence between random variables and algorithms to simulate outcomes.

Page 12: Sampling and estimation Petter Mostad 2005.09.26.

Examples

• To simulate outcomes 1,2,…,6 each with probability 1/6: Simulate pseudo-random u in [0,1), and let the outcome be i if u is between (i-1)/6 and i/6.

• To simulate exponentially distributed X with parameter λ: Simulate pseudo-random u in [0,1), and compute x=-log(u)/λ

Page 13: Sampling and estimation Petter Mostad 2005.09.26.

Stochastic variables and simulation of outcomes

The histogram of n simulated values will approach the probability distribution simulated from, as n increases

0 1 2 3 4 5

05

10

15

20

25

30

35

0 2 4 6 8 10

010000

20000

30000

40000

0 2 4 6 8

0100

200

300

400

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

n=100000n=1000

n=100

Page 14: Sampling and estimation Petter Mostad 2005.09.26.

Using simulation to study properties of samples

• We saw how we can find theoretically the expectation and variance of some functions of a sample

• Instead, we can simulate the function of the sample a large number of times, and study the distribution of these numbers: This gives approximate results.

Page 15: Sampling and estimation Petter Mostad 2005.09.26.

Example

• X1,X2,…,X10 is a random sample from the binomial distribution Bin(20,0.2)

• Simulating these 100000 times, and computing , we get

• The average of these 100000 numbers is 4.001, the variance is 0.3229

Frequency

2 3 4 5 6 7

05000

10000

20000

30000

X

Page 16: Sampling and estimation Petter Mostad 2005.09.26.

Studying the properties of averages

• If X1,X2,…,Xn is a random sample from some distribution, it is very common to want to study the mean

• In the following example, we have sampled from the Exponential distribution with λ parameter 1:– First (done 10000 times) taken average of 3 samples

– Then (done 10000 times) taken average of 30 samples

– Then (done 10000 times) taken average of 300 samples

Page 17: Sampling and estimation Petter Mostad 2005.09.26.

Frequency

0.8 0.9 1.0 1.1 1.2

0500

1000

1500

2000

2500

3000

Frequency

0.5 1.0 1.5

0500

1000

1500

2000

Frequency

0 1 2 3 4 5

01000

2000

3000

4000

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

Exp. distr; λ=1 Average of 3

Averageof 30

Averageof 300

Page 18: Sampling and estimation Petter Mostad 2005.09.26.

The Central Limit Theorem

• It is a very important fact that the above happens no matter what distribution you start with.

• The theorem states: If X1,X2,…,Xn is a random sample from a distribution with expectation μ and variance σ2, then

approaches a standard normal distribution when n gets large.

/

XZ

n

Page 19: Sampling and estimation Petter Mostad 2005.09.26.

Example

• Let X be from Bin(n,π):• X/n can be seen as the average over n Bernoulli

variables, so we can apply theory• We get that when n grows, the expression

gets an approximate standard normal distribution N(0,1).

• A rule for when to accept the approximation:

(1 ) /

xn

n

(1 ) 9n

Page 20: Sampling and estimation Petter Mostad 2005.09.26.

The sampling distribution of the sample variance

• Recall: the sample variance is • We can show theoretically that its expectation is

equal to the variance of the original distribution• We know that its distribution is approximately

normal if the sample is large• If the underlying distribtion is normal N(μ,σ2):

– is distributed as the distribution

2 211 1

( )n

in iS X X

42 2

( )1

Var Sn

2

2

( 1)n S

2

1n

Page 21: Sampling and estimation Petter Mostad 2005.09.26.

The Chi-square distribution

• The Chi-square distribution with n degrees of freedom is denoted

• It is equal to the sum of the squares of n independent random variables with standard normal distributions.

2n

0 2 4 6 8 10

0.00

0.05

0.10

0.15 2

4

Page 22: Sampling and estimation Petter Mostad 2005.09.26.

Estimation

• We have previously looked at– Probability models (with parameters)

– Properties of samples from such probability models

• We now turn this around and start with a dataset, and try to find a probability model fitting the data.

• A (point) estimator is a function of the data, meant to estimate a parameter of the model

• A (point) estimate is a value of the estimator, computed from the data

Page 23: Sampling and estimation Petter Mostad 2005.09.26.

Properties of estimators

• An estimator is unbiased if its expectation is equal to the parameter it is estimating

• The bias of an estimator is its expectation minus the parameter it is estimating

• The efficiency of an unbiased estimator is measured by its variance: One would like to have estimators with high efficiency (i.e., low variance)

Page 24: Sampling and estimation Petter Mostad 2005.09.26.

Confidence intervals: Example

• Assume μ and σ2 are some real numbers, and assume the data X1,X2,…,Xn are a random sample from N(μ,σ2). – Then

– thus

– so

and we say that is a confidence interval for μ with 95% confidence, based on the statistic

~ (0,1)/

XZ N

n

( 1.96 1.96) 95%P Z

( 1.96 1.96 ) 95%n n

P X X ( 1.96 , 1.96 )

n nX X

X

Page 25: Sampling and estimation Petter Mostad 2005.09.26.

Confidence intervals: interpretation

• Interpretation: If we do the following a large number of times: – We pick μ (and σ2)– We generate data and the statistic – We compute the confidence intervalthen the confidence interval will contain μ roughly 95%

of the time. • Note: The confidence interval pertains to μ (and

σ2), and to the particular statistic. If a different statistic is used, a different confidence interval could result.

X

Page 26: Sampling and estimation Petter Mostad 2005.09.26.

Example: a different statistic

• Assume in the example above we use instead of .

• We then get as before, and the confidence interval

• Note how this is different from before, as we have used a different statistic.

/

XZ

n

10

XZ

0 ~ (0,1)Z N

1 1( 1.96 , 1.96 )X X

Page 27: Sampling and estimation Petter Mostad 2005.09.26.

Alternative concept: Credibility interval

• The knowledge about μ can be formulated as a probability distribution

• If an interval I has 95% probability under this distribution, then I is called a credibility interval for μ, with credibility 95%

• It is very common, but wrong, to interpret confidence intervals as credibility intervals

Page 28: Sampling and estimation Petter Mostad 2005.09.26.

Example: Finding credibility intervals

• We must always start with a probability distribution π(μ) describing our knowledge about μ before looking at data

• As above, the probability distribution g for Z|μ is the normal distribution N(μ,σ2/n)

• Using Bayes formula, we get a probability distribution f for μ|Z: ( | ) ( )

( | )( )

g Zf Z

P Z

Page 29: Sampling and estimation Petter Mostad 2005.09.26.

Finding credibility intervals (cont.)

• IF we assume ”flat” knowledge about μ before observing data, i.e., that π(μ)=1, then

and a credibility interval becomes

• Similarily, if we assume π(μ)=1 and only observe X1, then a credibility interval becomes

2| ~ ( , / )Z N X n

( 1.96 , 1.96 )n n

X X

( 1.96 , 1.96 )X X

Page 30: Sampling and estimation Petter Mostad 2005.09.26.

Summary on confidence and credibility intervals

• Confidence and credibility intervals are NOT the same.

• A confidence interval says something about a parameter AND a random variable (or statistic) based on it.

• A credibility interval describes the knowledge about the parameter; it must always be based also on a specification of the knowledge before making the observations, as well as the observations

• In many cases, computed confidence intervals correspond to credibility intervals with a certain prior knowledge assumed.


Recommended