Sampling and estimation Petter Mostad 2005.09.26.

transcript

Sampling and estimation

Petter Mostad

2005.09.26

The normal distribution

• The most used continuous probability distribution: – Many observations tend to approximately

follow this distribution– It is easy and nice to do computations with– BUT: Using it can result in wrong conclusions

when it is not appropriate

-4 -2 0 2 4

The mean μ

μ-2σ μ+2σ

The normal distribution

• The probability density function is

• where

• Notation

• Standard normal distribution

• Using the normal density is often OK unless the actual distribution is very skewed

2 2( ) / 2

xf x e

( )E X 2( )Var X 2( , )N

(0,1)N

Normal probability plots

• Plotting the quantiles of the data versus the quantiles of the distribution.

• If the data is approximately normally distributed, the plot will approximately show a straight line -200 0 200 400 600 800 1 000 1 200

Observed Value

Normal Q-Q Plot of Household income in thousands

The Normal versus the Binomial distribution

• When n is large and π is not too close to 0 or 1, then the Binomial distribution becomes very similar to the Normal distribution with the same expectation and variance.

• This is a phenomenon that happens for all distributions that can be seen as a sum of independent observations.

• It can be used to make approximative computations for the Binomial distribution.

The Exponential distribution

• The exponential distribution is a distribution for positive numbers (parameter λ):

• It can be used to model the time until an event, when events arrive randomly at a constant rate

( ) tf t e

( ) 1/E T 2( ) 1/Var T

Sampling

• We need to start connecting the probability models we have introduced with the data we want to analyze.

• We (usually) want to regard our data as a simple random sample from a probability model: – Each is sampled independently from the other

– Each is sampled from the probability model

• Thus we go on to study the properties of simple random samples.

Example: The mean of a random sample

• If X1,X2,…,Xn is a random sample, then their sample mean is defined as

• As it is a function of random variables, it is a random variable.

• If E(Xi)=μ, then

• If Var(Xi)=σ2, then

( )E X 2

( )Var Xn

Example

• Assume X1,X2,…,X10 is a random sample from the binomial distribution Bin(20,0.2)

• We get

0 5 10 15 20

( ) 20 0.2 4iE X ( ) 20 0.2 (1 0.2) 3.2iVar X

( ) 4E X ( ) 3.2 /10 0.32Var X

Simulation

• Simulation: To generate outcomes by computer, on the basis of pseudo-random numbers

• Pseudo-random number: Generated by an algorithm completely unrelated to the way numers are used, so they appear random. Usually generated to be uniformly distributed between 0 and 1.

• There is a correspondence between random variables and algorithms to simulate outcomes.

Examples

• To simulate outcomes 1,2,…,6 each with probability 1/6: Simulate pseudo-random u in [0,1), and let the outcome be i if u is between (i-1)/6 and i/6.

• To simulate exponentially distributed X with parameter λ: Simulate pseudo-random u in [0,1), and compute x=-log(u)/λ

Stochastic variables and simulation of outcomes

The histogram of n simulated values will approach the probability distribution simulated from, as n increases

0 1 2 3 4 5

0 2 4 6 8 10

010000

0 2 4 6 8

0 2 4 6 8 10

n=100000n=1000

Using simulation to study properties of samples

• We saw how we can find theoretically the expectation and variance of some functions of a sample

• Instead, we can simulate the function of the sample a large number of times, and study the distribution of these numbers: This gives approximate results.

Example

• X1,X2,…,X10 is a random sample from the binomial distribution Bin(20,0.2)

• Simulating these 100000 times, and computing , we get

• The average of these 100000 numbers is 4.001, the variance is 0.3229

Frequency

2 3 4 5 6 7

Studying the properties of averages

• If X1,X2,…,Xn is a random sample from some distribution, it is very common to want to study the mean

• In the following example, we have sampled from the Exponential distribution with λ parameter 1:– First (done 10000 times) taken average of 3 samples

– Then (done 10000 times) taken average of 30 samples

– Then (done 10000 times) taken average of 300 samples

Frequency

0.8 0.9 1.0 1.1 1.2

Frequency

0.5 1.0 1.5

Frequency

0 1 2 3 4 5

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Exp. distr; λ=1 Average of 3

Averageof 30

Averageof 300

The Central Limit Theorem

• It is a very important fact that the above happens no matter what distribution you start with.

• The theorem states: If X1,X2,…,Xn is a random sample from a distribution with expectation μ and variance σ2, then

approaches a standard normal distribution when n gets large.

Example

• Let X be from Bin(n,π):• X/n can be seen as the average over n Bernoulli

variables, so we can apply theory• We get that when n grows, the expression

gets an approximate standard normal distribution N(0,1).

• A rule for when to accept the approximation:

(1 ) /

(1 ) 9n

The sampling distribution of the sample variance

• Recall: the sample variance is • We can show theoretically that its expectation is

equal to the variance of the original distribution• We know that its distribution is approximately

normal if the sample is large• If the underlying distribtion is normal N(μ,σ2):

– is distributed as the distribution

2 211 1

in iS X X

Var Sn

( 1)n S

The Chi-square distribution

• The Chi-square distribution with n degrees of freedom is denoted

• It is equal to the sum of the squares of n independent random variables with standard normal distributions.

0 2 4 6 8 10

0.15 2

Estimation

• We have previously looked at– Probability models (with parameters)

– Properties of samples from such probability models

• We now turn this around and start with a dataset, and try to find a probability model fitting the data.

• A (point) estimator is a function of the data, meant to estimate a parameter of the model

• A (point) estimate is a value of the estimator, computed from the data

Properties of estimators

• An estimator is unbiased if its expectation is equal to the parameter it is estimating

• The bias of an estimator is its expectation minus the parameter it is estimating

• The efficiency of an unbiased estimator is measured by its variance: One would like to have estimators with high efficiency (i.e., low variance)

Confidence intervals: Example

• Assume μ and σ2 are some real numbers, and assume the data X1,X2,…,Xn are a random sample from N(μ,σ2). – Then

– thus

– so

and we say that is a confidence interval for μ with 95% confidence, based on the statistic

~ (0,1)/

( 1.96 1.96) 95%P Z

( 1.96 1.96 ) 95%n n

P X X ( 1.96 , 1.96 )

n nX X

Confidence intervals: interpretation

• Interpretation: If we do the following a large number of times: – We pick μ (and σ2)– We generate data and the statistic – We compute the confidence intervalthen the confidence interval will contain μ roughly 95%

of the time. • Note: The confidence interval pertains to μ (and

σ2), and to the particular statistic. If a different statistic is used, a different confidence interval could result.

Example: a different statistic

• Assume in the example above we use instead of .

• We then get as before, and the confidence interval

• Note how this is different from before, as we have used a different statistic.

0 ~ (0,1)Z N

1 1( 1.96 , 1.96 )X X

Alternative concept: Credibility interval

• The knowledge about μ can be formulated as a probability distribution

• If an interval I has 95% probability under this distribution, then I is called a credibility interval for μ, with credibility 95%

• It is very common, but wrong, to interpret confidence intervals as credibility intervals

Example: Finding credibility intervals

• We must always start with a probability distribution π(μ) describing our knowledge about μ before looking at data

• As above, the probability distribution g for Z|μ is the normal distribution N(μ,σ2/n)

• Using Bayes formula, we get a probability distribution f for μ|Z: ( | ) ( )

( | )( )

g Zf Z

Finding credibility intervals (cont.)

• IF we assume ”flat” knowledge about μ before observing data, i.e., that π(μ)=1, then

and a credibility interval becomes

• Similarily, if we assume π(μ)=1 and only observe X1, then a credibility interval becomes

2| ~ ( , / )Z N X n

( 1.96 , 1.96 )n n

( 1.96 , 1.96 )X X

Summary on confidence and credibility intervals

• Confidence and credibility intervals are NOT the same.

• A confidence interval says something about a parameter AND a random variable (or statistic) based on it.

• A credibility interval describes the knowledge about the parameter; it must always be based also on a specification of the knowledge before making the observations, as well as the observations

• In many cases, computed confidence intervals correspond to credibility intervals with a certain prior knowledge assumed.

Sampling and estimation Petter Mostad 2005.09.26.

Documents