Chap2 Sampling Distns I

8/7/2019 Chap2 Sampling Distns I

1/20

1

2. SAMPLING DISTRIBUTION

2.1 Preliminaries

Numerical descriptive measures computed from the population measurementare called parameters. A statistic is a quantity calculated from theobservations in a sample

Population mean: and Sample mean: x

Population variance:( )

2

2 1

N

i

i

x

N

=

=

Sample variance:( )

2

2 1

1

n

i

i

x x

sn

=

=

The standard error of a statistic is the standard deviation of the samplingdistribution of that statistic

2.1.1 Introduction

In M25A, you were introduced to some useful random variables and their

probability distributions. In practical sampling situations, we select a sample of n

observations and use these measurements to calculate statistics such as the sample

mean and variance. These statistics are used to make inferences about the

corresponding parameters in the sampled population. Since, the value of a statisticdepends upon the observed values in the sample, a statistics is itself a random

variable that may be discrete or continuous. The probability distribution of a

statistic is called its sampling distribution, since it describes the behaviour of the

statistic in repeated sampling.

The sampling distribution of a statistic is the probability distribution for the

values of the statistic that results when random samples of size n are repeatedly

drawn from the population.

The sampling distribution may be derived mathematically or approximated

empirically. Empirical approximations are found by drawing a large number of

samples of size n from the specified population, calculate the value of the

statistics for each sample and tabulate the results in a relative frequency histogram.

When the number of samples is large, the relative frequency histogram should

closely approximate the theoretical sampling distribution.


2/20

2

2.2 Sampling Distributions

Consider a random sample of size 3n = drawn with replacement from a

population of 5N = elements. As simple random sample of size n is selected in

such a way that every sample of size n has the same probability of being selected

equal to ( )1N

nC, where N nC is the number of sample.

Suppose we have a population of 5N = elements whose values are 3,6,9,12 and

15. There are five distinct elements, the population probability distribution:

( ) 15p x = for 3,6,9,12,15x =

Sample Sample values x m

1 3, 6, 9 6 6

2 3, 6, 12 7 6

3 3, 6, 15 8 6

4 3, 9, 12 8 9

5 3, 9, 15 9 9

6 3, 12, 15 10 12

7 6, 9, 12 9 9

8 6, 9, 15 10 9

9 6, 12, 15 11 12

10 9, 12, 15 12 12

The table above shows that the values of x and m (median) associated with each

sample are each assigned probability equal to 110

. So, we will observe a value of

6x = only if sample 1 is selected and this occurs with probability 0.1 A value of

8x = will occur if sample 3 or sample 4 is drawn; therefore, probability of

observe 8x = is 0.2.

Hence, the sampling distributions for x is shown below.

x ( )p x

6 0.1

7 0.1

8 0.2

9 0.2


3/20

3

10 0.2

11 0.1

12 0.1

2.3 Central Limit Theorem

If random samples of nobservations are drawn from a non-normal population

with finite mean and standard deviation ; then when n is large, the sampling

distribution of the sample mean x is approximately normally distributed, with

mean and standard deviation:

x = and xn

=

Diagrams


4/20

4

2.4 Sampling distribution: Sample Mean

The standard deviation of a statistic used as an estimator of a population parameter is often

called thestandard error of the estimator, since it refers to the precision of the estimator.

Thus, the standard deviation of x is referred to as thestandard error (s.e.) of the mean.

Example 2.1

Suppose that you select a random sample of 25n = observations from a

population with mean 8 = and 0.6 = . Find the probability that the sample

mean x will

a) be less than 7.9 b) exceeds 7.9 c) lie within 0.1 of the 8 =

Solution

a) since 25n = , is relatively large, then the sampling distribution of x is

approximately normally distributed due to CLT

Now,0.6

0.1225

xn

= = =

( )7.9 8.0

7.90.12x

xP x P

< =


5/20

5

b) ( ) ( )7.9 0.83 0.7967P x P Z > = > =

c) ( ) ( )7.9 8.1 0.83 0.83 0.7967 0.2033 0.5934P x P Z < < = < < = =

2.5 Sampling distribution: Sample Proportion

Consider a sampling problem involving consumer preference or opinion poll; we

are concerned with estimating the proportion p of the people in the population

who possess some specific characteristic. These are practical examples of

binomial experiments, if the sampling procedure has been conducted in the appropriate

manner.

(i) If a random sample of nobservations is selected from a binomialpopulation with parameter p , then sampling distribution of thesample proportion is given by:

x

pn

= will have: p p = and ppq

n =

The probability that

lies within 0.1 of


6/20

6

(ii) When the sample size is large, the sampling distribution of p can beapproximated by a normal distribution. The approximation will be

adequate if 2p p and 2p p + fall in the interval 0 to 1.

(iii) A rule of thumb for the approximation to be satisfactory is that 5np > and 5npq >

(iv) We us this normal approximation to evaluate the probability that thebinominal variable Y is less than or greater than a particular value y . This

y is an integer, so we must take account that we are approximating a

discrete random variable Y by a continuous random variable X . So, we

think of the probability mass corresponding to value y as being spread

over the interval, ( )1 12 2,y y +

Hence, using a continuity correction:

( ) ( )12P Y y P X y + i.e. adding a half

( ) ( )12P Y y P X y i.e. subtracting a half

When X is continuous, ( ) 0P X x= = for any x , we can specify probabilities

in intervals only, not at points. However, using a normal approximation we

can specify ( )P Y y= is equal to ( )1 12 2P y X y + . So, ( )P Y y and

( )P Y y> are the not the same, they differ by an amount equal to ( )P Y y=

Hence, using continuity correction:

( ) ( )12P Y y P X y<

( ) ( )12P Y y P X y> +


7/20

7

Example 2.2

A survey of 313 children, ages 14 to 22, selected from the nations top corporate

executives; when asked to identify the best aspect of being privileged in this

group, 55% mentioned material and financial gains.a) describe the sampling distribution of the sample proportion

b) assume that the population proportion is 0.5; what is the probability of

observing a sample proportion as large or larger than p ?

Solution

a) since the sample size is large, then the distribution of p is normally distributed

with mean 0.55p = and 0.55 0.45

0.028313p

pq pq

n n

= = =

therefore, we know that approximately 95% of the time p will fall within

2 0.056p of the unknown value of p .

One could check the condition that allows for normal approximation to the distribution of p ; ie.

2 0.55 0.056pp = or 0.494 to 0.606, which falls in the interval 0 to 1

b) we are given that 0.5p p = = and 0.5 0.5

0.0283313

p

pq

n

= = =

( )0.55 0.5

0.550.0283

P p P Z

=

( )1.77 0.0384P Z =

The sampling distribution of

based on a sample of

children


8/20

8

This tells us that if we were to select a random sample of 313n = observations from a

population with proportion 0.5p = , the probability that the sample proportion p would be as

large or larger than 0.55 is only 4%.

Alternatively: using the correction of continuity, the equivalent to 0.5 would be 12n

,

So;

( )( )1

0.55 0.0016 0.51.71 0.0436

0.0283P Z P Z

= =

When n is large, the effect of using the correction is generally negligible.

2.6 Sampling distribution: Sum or Difference between two sample mean

When independent random samples of size 1n and 2n observations have been

selected from population with means 1 and 2 , and variances21 and

22

respectively; the sampling distribution of the sum or differences will have the

following properties:

(a)The mean and standard deviation of ( )1 2x x :( )1 2 1 2x x

= and ( )1 2

2 21 2

1 2

x xn n

= +

(b)If the sampled populations are normally distributed, then the samplingdistribution is exactly normally distributed regardless of the sample size

(c)If the sampled populations are not normally distributed, then the samplingdistribution is approximately normally distributed when the sample size are

large due to the CLT


9/20

9

Example 2.3

A random sample of 40 teachers were selected from high schools in Kingston

and in St Ann. What is the probability that the sample mean salary from Kingston

will exceed the sample mean salary from St Ann by $1500 or more? Given that

Kingston mean salary is $29,000 and St Ann mean salary is $28,621 and standarddeviations for two population salary are $5000 and $4700 respectively.

Solution

Let 1x be the mean salary for Kgn and 2x be the mean salary for St Ann; also,2

1

and 22 be standard deviation respectively.

Given that: 1 29,000x = , 2 28,621x = and2

1 5000 = ,2

2 4700 =

then

( )1 2 1 2 29,000 28,621 379x x = = =

( )1 2

2 2 2 2

1 2

1 2

5000 47001085.0115

40 40x x

n n

= + = + =

Since sample size is large, then we can use the normal approximation

( )1 21500 379

1500 1085.0115P x x P Z

> = >

( ) ( )1.03 1 1.03 1 0.8485 0.1515P Z > = = =

The sampling distribution

of


10/20

10

2.7 Sampling distribution: Difference between two sample proportions

Assume that independent random samples of size 1n and 2n observations have

been selected from binomial populations with parameters 1p and 2p respectively.

Then the sampling distributions of difference between sample proportions

( ) 1 21 21 2

x x

p pn n

=

will have the following properties:

(a)The mean and standard deviation of ( )1 2 p p :( )1 2 1 2 p p

p p

= and ( )1 21 1 2 2

1 2

p p

p q p q

n n

= +

(b)The sampling distribution of ( )1 2 p p can be approximated by a normaldistribution when both sample sizes are large due to CLT

(c)When we use a normal distribution to approximate binomial probabilities,the interval ( ) ( )1 21 2 2 p pp p should varies from 1 to 1.

Example 2.4A local newspaper reported that 75% of the residents in the developing section

and 60% of the residents in other parts of the city favour passage of a proposed

bond issue to build a new school. Random samples of 1 50n = residents in

developing section of the city and 2 100n = residents in other parts of the city are

selected, and the residents in the sample are asked whether they favour the bond

proposal. What is the probability that the difference in magnitude between the

sample proportions favouring the bond proposal does not exceed 10%.

Solution

Let us assume that 1 0.75p = and 2 0.60p = , and, the sampling distributions of

the difference between proportions to be approximately normally distributed.


11/20

11

So, ( ) ( )1 2 1 2 0.75 0.60 0.15p p p p = = =

and

( )1 21 1 2 2

1 2

0.75 0.25 0.6 .40.0784

50 100p p

p q p q

n n

= + = + =

We wish to find ( )1 2 0.1 0.1P p p < <

Hence,

( )1 20.1 0.15 0.1 0.15

0.1 0.10.0784 0.0784

P p p P Z

< < = <


12/20

12

2.8 LargeSample Estimation

Since populations are characterised by numerical descriptive measures called

parameters, statistical inference is concerned with making inferences about

population parameters. Methods for making inferences about parameters fall into

one of two categories. We may make decisions concerning the value of the

parameter, or we may estimate or predict the value of the parameter. Which

method of inference should be used; that is, should the parameter be estimated or

should we test a hypothesis concerning its value?

Estimation procedures can be divided into two types, point estimation and

interval estimation.

An estimator is a statistic used to estimate a population parameter; it is a function

of the sample observations

An estimate is the value an estimator takes for a particular sample. Also, called a

point estimate.

An interval estimator of a population parameter tells us how to calculate two

numbers based on sample data, forming an interval within which the parameter is

expected to lie. This pair of numbers is called an interval estimateor confidence

interval.

Suppose we let denote an estimator of the population parameter ( , , or

any parameter). We would like our estimator to be unbiased and the spread of

the sampling distribution of the estimator be as small as possible.

The distance between the estimate and the parameter, called the error of

estimation

The probability that a confidence interval will enclose the estimated parameter is

called the confidence coefficient

A good confidence interval is one that is narrow as possible and has a large

confidence coefficient, near 1. The narrower the interval, the more exactly we

have located the estimated parameter.


13/20

13

Suppose we want to estimate the mean number of bacteria per cubic centimetre in

a polluted stream. If we draw 10 samples, each containing 30n = observations;

Construct, a confidence interval for the population mean for each sample, the

intervals might appear as shown in diagram below.

The horizontal line segments represent the ten intervals and the vertical line

represents the location of the true mean number of bacteria per cubic cm. The

parameter is fixed and that the interval location and width may vary from sample

to sample. Thus, we speak of the probability that the interval encloses , not

the probability that falls in the interval, because is fixed. The interval israndom.

A( )1 100% confidence interval for :2

z

where2

z is the z value corresponding to an area 2 in the upper tail of a

standard normal distribution.

Also,2

z

is the bound on the error of estimation

2

z + is called the upper confidence limit and2

z is called the lower

confidence limit

Ten confidence intervals for the

mean number of bacteria per cubic

cm each based on a sample of

observations


14/20

14

2.9 Confidence Interval (CI) for Population Mean

A( )1 100% confidence interval for :2

x zn

Note: If is unknown, it can be approximated by the sample standard deviationwhen the sample size is large.

Remark:If you want a confidence coefficient ( )1 equal to 0.95, then the tail-

end area is 0.05 = and half of is placed in each tail of the distribution. So of

the commonly used confidence coefficients are shown in the table below.

Confidence coefficient

( )1 2

z LCL UCL

0.90 0.1 1.6451.645x

n

1.645x

n

+

0.95 0.05 1.961.96x

n

1.96x

n

+

0.99 0.01 2.582.58x

n

2.58x

n

+

Location of


15/20

15

Example 2.5

Suppose that we wish to estimate the mean daily yield of a chemical manufactured

in a chemical plat. The daily yield, recorded for 50 days, produced a mean and

standard deviation of 871x = tons and 21 = tons. Find a 90% confidence

interval for the population mean.

Solution

A 90% CI for :2

x zn

where

2

0.1 1.645z = =

hence,21

871 1.64550

or 871 4.89

Interpretation:

i. Therefore, we estimate the mean daily yield to be nor more than 875.89tons and no less than 866.11 tons

ii. In repeated sampling, 90% of the confidence intervals similarly formed willenclosed the true value of

iii. Therefore, we estimate the mean daily yield lies in the interval from866.11 to 875.859 tons


16/20

16

Confidence Interval for difference between two means

A( )1 100% confidence interval for ( )1 2 :

( ) 2

2 21 2

1 21 2

x x z n n

+

Note: If 21 and22 are unknown, but both 1n and 2n are greater than or equal to

30, you can use the sample variances 21s and22s to estimate

21 and

22 .

Example 2.6A comparison of wearing quality of two types of automobile tyres were obtained

by road-testing samples of 1 2 100n n= = tires for each type. The number of miles

until wear-out was recorded. Estimate the difference in mean miles to wear-out,

the standard error and find a 99% CI.

Tyre 1: 1 26,400x = miles and21 1,440,000s =

Tyre 2: 2 25,100x = and22 1,960,000s =

Solution

The point estimate of ( )1 2 is ( )1 2 26,400 25,100 1300x x = = miles

The standard error (s.e.) of ( )1 2x x is

2 2 2 21 2 1 2

1 2 1 2

1440000 1960000184.4

100 100

s s

n n n n

+ + = + =

A 99% CI for ( )1 2 : ( )2

2 2

1 21 2

1 2

x x zn n

+


17/20

17

i.e.

( )( )1300 2.58 184.4 1300 475.752

Hence, we are 99% confident that mean difference in miles to wear-out is

estimated to lie between 824.2 and 1775.8

Confidence Interval for Proportion

A( )1 100% confidence interval for p :2

pqp z

n

Example 2.7

A random sample of 100 voters in a community produced 59x = voters

favouring candidate J. Find an estimate for population who favoured candidate J;

also a 95% confidence interval for the population proportion.

Solution

A point estimate for p is

59

0.59100

x

p n= = =

A 95% CI for p :2

pqp z

n

i.e. ( )0.59 1.96 0.049 or 0.59 0.09604

Therefore, in repeated sampling, 95% of the confidence interval calculated this

way will enclosed the true value of p


18/20

18

Confidence Interval for Proportion Differences

A( )1 100% confidence interval for ( )1 2p p :

( )2

1 1 2 21 2

1 2

p q p qp p z

n n

+

Assumption: 1n and 2n must be sufficiently large so that the sampling

distribution of ( )1 2 p p can be approximated by a normal distribution.

Example 2.8

A manufacturer of fly spray wished to compare two new sprays 1 and 2. Two

rooms of equal size, each containing 1000 flies, were used in the experiment.

Room A was treated with fly spray1 and room B with spray2. A total of 825 and

760 flies succumbed to sprays 1 and 2 respectively. Estimate the difference in the

rate of kill for the two sprays and a 90% confidence interval.

Solution

The point estimate of ( )1 2p p : ( )1 2 0.825 0.76 0.065p p = =

The standard error:

( )( ) ( ) ( )1 1 2 2

1 2

0.825 0.175 0.76 0.24 0.017857

1000 1000

p q p q

n n+ = + =

A 90% CI for ( )1 2p p : ( )2

1 1 2 21 2

1 2

p q p qp p z

n n

+

i.e

( )0.065 1.645 0.017857 or 0.065 0.02934

Hence, we are 90% confident that the difference between the rates of kill lies

between 0.036 to 0.094 units.


19/20

19

2.10 Sample size

How many observations should be included in the sample?Unfortunately, we

cannot answer this question without knowing how much information the experimenter wishes to

buy.

Suppose we wish to estimate the mean daily yield and we would like the error of

estimation to be less than 4 tons with a probability of 0.95.

Now 95% of the sample means will lie within 1.96 x of in repeated sampling;

hence, we are asking that 1.96 x equal 4 tons.

Thus,

1.96 4x = i.e. 1.96 4n

=

Solving for n , we obtain

2

21.96

4n

=

or 20.24n =

We assume that n is very large and s

Thus, ( )22

0.24 0.24 21 105.9n = = = Hence, a sample size of 106

Procedure

Let be the parameter to be estimated and let

be the standard deviation of

the point estimator. Then proceed as follows:

1) Choose B , the bound on the error of estimation, and a confidencecoefficient ( )1

2) Assume that n is large; solve the following equation for the sample size :n

2

z B

=

where2

z is the value of z having 2 to its right


20/20

2.11 The p-value

The smallest value of for which the test results are statistically significant is

often called the p-value or the observed significance level.

More formally, the p-value (probability value) is the probability of obtaining a

result at least as extreme as the one that was observed assuming that 0H is true.

Example

Date post:	08-Apr-2018
Category:	Documents
Upload:	chris-topher
View:	223 times
Download:	0 times

Chap2 Sampling Distns I

Documents