Date post: | 08-Apr-2018 |
Category: |
Documents |
Upload: | chris-topher |
View: | 223 times |
Download: | 0 times |
of 20
8/7/2019 Chap2 Sampling Distns I
1/20
1
2. SAMPLING DISTRIBUTION
2.1 Preliminaries
Numerical descriptive measures computed from the population measurementare called parameters. A statistic is a quantity calculated from theobservations in a sample
Population mean: and Sample mean: x
Population variance:( )
2
2 1
N
i
i
x
N
=
=
Sample variance:( )
2
2 1
1
n
i
i
x x
sn
=
=
The standard error of a statistic is the standard deviation of the samplingdistribution of that statistic
2.1.1 Introduction
In M25A, you were introduced to some useful random variables and their
probability distributions. In practical sampling situations, we select a sample of n
observations and use these measurements to calculate statistics such as the sample
mean and variance. These statistics are used to make inferences about the
corresponding parameters in the sampled population. Since, the value of a statisticdepends upon the observed values in the sample, a statistics is itself a random
variable that may be discrete or continuous. The probability distribution of a
statistic is called its sampling distribution, since it describes the behaviour of the
statistic in repeated sampling.
The sampling distribution of a statistic is the probability distribution for the
values of the statistic that results when random samples of size n are repeatedly
drawn from the population.
The sampling distribution may be derived mathematically or approximated
empirically. Empirical approximations are found by drawing a large number of
samples of size n from the specified population, calculate the value of the
statistics for each sample and tabulate the results in a relative frequency histogram.
When the number of samples is large, the relative frequency histogram should
closely approximate the theoretical sampling distribution.
8/7/2019 Chap2 Sampling Distns I
2/20
2
2.2 Sampling Distributions
Consider a random sample of size 3n = drawn with replacement from a
population of 5N = elements. As simple random sample of size n is selected in
such a way that every sample of size n has the same probability of being selected
equal to ( )1N
nC, where N nC is the number of sample.
Suppose we have a population of 5N = elements whose values are 3,6,9,12 and
15. There are five distinct elements, the population probability distribution:
( ) 15p x = for 3,6,9,12,15x =
Sample Sample values x m
1 3, 6, 9 6 6
2 3, 6, 12 7 6
3 3, 6, 15 8 6
4 3, 9, 12 8 9
5 3, 9, 15 9 9
6 3, 12, 15 10 12
7 6, 9, 12 9 9
8 6, 9, 15 10 9
9 6, 12, 15 11 12
10 9, 12, 15 12 12
The table above shows that the values of x and m (median) associated with each
sample are each assigned probability equal to 110
. So, we will observe a value of
6x = only if sample 1 is selected and this occurs with probability 0.1 A value of
8x = will occur if sample 3 or sample 4 is drawn; therefore, probability of
observe 8x = is 0.2.
Hence, the sampling distributions for x is shown below.
x ( )p x
6 0.1
7 0.1
8 0.2
9 0.2
8/7/2019 Chap2 Sampling Distns I
3/20
3
10 0.2
11 0.1
12 0.1
2.3 Central Limit Theorem
If random samples of nobservations are drawn from a non-normal population
with finite mean and standard deviation ; then when n is large, the sampling
distribution of the sample mean x is approximately normally distributed, with
mean and standard deviation:
x = and xn
=
Diagrams
8/7/2019 Chap2 Sampling Distns I
4/20
4
2.4 Sampling distribution: Sample Mean
The standard deviation of a statistic used as an estimator of a population parameter is often
called thestandard error of the estimator, since it refers to the precision of the estimator.
Thus, the standard deviation of x is referred to as thestandard error (s.e.) of the mean.
Example 2.1
Suppose that you select a random sample of 25n = observations from a
population with mean 8 = and 0.6 = . Find the probability that the sample
mean x will
a) be less than 7.9 b) exceeds 7.9 c) lie within 0.1 of the 8 =
Solution
a) since 25n = , is relatively large, then the sampling distribution of x is
approximately normally distributed due to CLT
Now,0.6
0.1225
xn
= = =
( )7.9 8.0
7.90.12x
xP x P
< =
8/7/2019 Chap2 Sampling Distns I
5/20
5
b) ( ) ( )7.9 0.83 0.7967P x P Z > = > =
c) ( ) ( )7.9 8.1 0.83 0.83 0.7967 0.2033 0.5934P x P Z < < = < < = =
2.5 Sampling distribution: Sample Proportion
Consider a sampling problem involving consumer preference or opinion poll; we
are concerned with estimating the proportion p of the people in the population
who possess some specific characteristic. These are practical examples of
binomial experiments, if the sampling procedure has been conducted in the appropriate
manner.
(i) If a random sample of nobservations is selected from a binomialpopulation with parameter p , then sampling distribution of thesample proportion is given by:
x
pn
= will have: p p = and ppq
n =
The probability that
lies within 0.1 of
8/7/2019 Chap2 Sampling Distns I
6/20
6
(ii) When the sample size is large, the sampling distribution of p can beapproximated by a normal distribution. The approximation will be
adequate if 2p p and 2p p + fall in the interval 0 to 1.
(iii) A rule of thumb for the approximation to be satisfactory is that 5np > and 5npq >
(iv) We us this normal approximation to evaluate the probability that thebinominal variable Y is less than or greater than a particular value y . This
y is an integer, so we must take account that we are approximating a
discrete random variable Y by a continuous random variable X . So, we
think of the probability mass corresponding to value y as being spread
over the interval, ( )1 12 2,y y +
Hence, using a continuity correction:
( ) ( )12P Y y P X y + i.e. adding a half
( ) ( )12P Y y P X y i.e. subtracting a half
When X is continuous, ( ) 0P X x= = for any x , we can specify probabilities
in intervals only, not at points. However, using a normal approximation we
can specify ( )P Y y= is equal to ( )1 12 2P y X y + . So, ( )P Y y and
( )P Y y> are the not the same, they differ by an amount equal to ( )P Y y=
Hence, using continuity correction:
( ) ( )12P Y y P X y<
( ) ( )12P Y y P X y> +
8/7/2019 Chap2 Sampling Distns I
7/20
7
Example 2.2
A survey of 313 children, ages 14 to 22, selected from the nations top corporate
executives; when asked to identify the best aspect of being privileged in this
group, 55% mentioned material and financial gains.a) describe the sampling distribution of the sample proportion
b) assume that the population proportion is 0.5; what is the probability of
observing a sample proportion as large or larger than p ?
Solution
a) since the sample size is large, then the distribution of p is normally distributed
with mean 0.55p = and 0.55 0.45
0.028313p
pq pq
n n
= = =
therefore, we know that approximately 95% of the time p will fall within
2 0.056p of the unknown value of p .
One could check the condition that allows for normal approximation to the distribution of p ; ie.
2 0.55 0.056pp = or 0.494 to 0.606, which falls in the interval 0 to 1
b) we are given that 0.5p p = = and 0.5 0.5
0.0283313
p
pq
n
= = =
( )0.55 0.5
0.550.0283
P p P Z
=
( )1.77 0.0384P Z =
The sampling distribution of
based on a sample of
children
8/7/2019 Chap2 Sampling Distns I
8/20
8
This tells us that if we were to select a random sample of 313n = observations from a
population with proportion 0.5p = , the probability that the sample proportion p would be as
large or larger than 0.55 is only 4%.
Alternatively: using the correction of continuity, the equivalent to 0.5 would be 12n
,
So;
( )( )1
0.55 0.0016 0.51.71 0.0436
0.0283P Z P Z
= =
When n is large, the effect of using the correction is generally negligible.
2.6 Sampling distribution: Sum or Difference between two sample mean
When independent random samples of size 1n and 2n observations have been
selected from population with means 1 and 2 , and variances21 and
22
respectively; the sampling distribution of the sum or differences will have the
following properties:
(a)The mean and standard deviation of ( )1 2x x :( )1 2 1 2x x
= and ( )1 2
2 21 2
1 2
x xn n
= +
(b)If the sampled populations are normally distributed, then the samplingdistribution is exactly normally distributed regardless of the sample size
(c)If the sampled populations are not normally distributed, then the samplingdistribution is approximately normally distributed when the sample size are
large due to the CLT
8/7/2019 Chap2 Sampling Distns I
9/20
9
Example 2.3
A random sample of 40 teachers were selected from high schools in Kingston
and in St Ann. What is the probability that the sample mean salary from Kingston
will exceed the sample mean salary from St Ann by $1500 or more? Given that
Kingston mean salary is $29,000 and St Ann mean salary is $28,621 and standarddeviations for two population salary are $5000 and $4700 respectively.
Solution
Let 1x be the mean salary for Kgn and 2x be the mean salary for St Ann; also,2
1
and 22 be standard deviation respectively.
Given that: 1 29,000x = , 2 28,621x = and2
1 5000 = ,2
2 4700 =
then
( )1 2 1 2 29,000 28,621 379x x = = =
( )1 2
2 2 2 2
1 2
1 2
5000 47001085.0115
40 40x x
n n
= + = + =
Since sample size is large, then we can use the normal approximation
( )1 21500 379
1500 1085.0115P x x P Z
> = >
( ) ( )1.03 1 1.03 1 0.8485 0.1515P Z > = = =
The sampling distribution
of
8/7/2019 Chap2 Sampling Distns I
10/20
10
2.7 Sampling distribution: Difference between two sample proportions
Assume that independent random samples of size 1n and 2n observations have
been selected from binomial populations with parameters 1p and 2p respectively.
Then the sampling distributions of difference between sample proportions
( ) 1 21 21 2
x x
p pn n
=
will have the following properties:
(a)The mean and standard deviation of ( )1 2 p p :( )1 2 1 2 p p
p p
= and ( )1 21 1 2 2
1 2
p p
p q p q
n n
= +
(b)The sampling distribution of ( )1 2 p p can be approximated by a normaldistribution when both sample sizes are large due to CLT
(c)When we use a normal distribution to approximate binomial probabilities,the interval ( ) ( )1 21 2 2 p pp p should varies from 1 to 1.
Example 2.4A local newspaper reported that 75% of the residents in the developing section
and 60% of the residents in other parts of the city favour passage of a proposed
bond issue to build a new school. Random samples of 1 50n = residents in
developing section of the city and 2 100n = residents in other parts of the city are
selected, and the residents in the sample are asked whether they favour the bond
proposal. What is the probability that the difference in magnitude between the
sample proportions favouring the bond proposal does not exceed 10%.
Solution
Let us assume that 1 0.75p = and 2 0.60p = , and, the sampling distributions of
the difference between proportions to be approximately normally distributed.
8/7/2019 Chap2 Sampling Distns I
11/20
11
So, ( ) ( )1 2 1 2 0.75 0.60 0.15p p p p = = =
and
( )1 21 1 2 2
1 2
0.75 0.25 0.6 .40.0784
50 100p p
p q p q
n n
= + = + =
We wish to find ( )1 2 0.1 0.1P p p < <
Hence,
( )1 20.1 0.15 0.1 0.15
0.1 0.10.0784 0.0784
P p p P Z
< < = <
8/7/2019 Chap2 Sampling Distns I
12/20
12
2.8 LargeSample Estimation
Since populations are characterised by numerical descriptive measures called
parameters, statistical inference is concerned with making inferences about
population parameters. Methods for making inferences about parameters fall into
one of two categories. We may make decisions concerning the value of the
parameter, or we may estimate or predict the value of the parameter. Which
method of inference should be used; that is, should the parameter be estimated or
should we test a hypothesis concerning its value?
Estimation procedures can be divided into two types, point estimation and
interval estimation.
An estimator is a statistic used to estimate a population parameter; it is a function
of the sample observations
An estimate is the value an estimator takes for a particular sample. Also, called a
point estimate.
An interval estimator of a population parameter tells us how to calculate two
numbers based on sample data, forming an interval within which the parameter is
expected to lie. This pair of numbers is called an interval estimateor confidence
interval.
Suppose we let denote an estimator of the population parameter ( , , or
any parameter). We would like our estimator to be unbiased and the spread of
the sampling distribution of the estimator be as small as possible.
The distance between the estimate and the parameter, called the error of
estimation
The probability that a confidence interval will enclose the estimated parameter is
called the confidence coefficient
A good confidence interval is one that is narrow as possible and has a large
confidence coefficient, near 1. The narrower the interval, the more exactly we
have located the estimated parameter.
8/7/2019 Chap2 Sampling Distns I
13/20
13
Suppose we want to estimate the mean number of bacteria per cubic centimetre in
a polluted stream. If we draw 10 samples, each containing 30n = observations;
Construct, a confidence interval for the population mean for each sample, the
intervals might appear as shown in diagram below.
The horizontal line segments represent the ten intervals and the vertical line
represents the location of the true mean number of bacteria per cubic cm. The
parameter is fixed and that the interval location and width may vary from sample
to sample. Thus, we speak of the probability that the interval encloses , not
the probability that falls in the interval, because is fixed. The interval israndom.
A( )1 100% confidence interval for :2
z
where2
z is the z value corresponding to an area 2 in the upper tail of a
standard normal distribution.
Also,2
z
is the bound on the error of estimation
2
z + is called the upper confidence limit and2
z is called the lower
confidence limit
Ten confidence intervals for the
mean number of bacteria per cubic
cm each based on a sample of
observations
8/7/2019 Chap2 Sampling Distns I
14/20
14
2.9 Confidence Interval (CI) for Population Mean
A( )1 100% confidence interval for :2
x zn
Note: If is unknown, it can be approximated by the sample standard deviationwhen the sample size is large.
Remark:If you want a confidence coefficient ( )1 equal to 0.95, then the tail-
end area is 0.05 = and half of is placed in each tail of the distribution. So of
the commonly used confidence coefficients are shown in the table below.
Confidence coefficient
( )1 2
z LCL UCL
0.90 0.1 1.6451.645x
n
1.645x
n
+
0.95 0.05 1.961.96x
n
1.96x
n
+
0.99 0.01 2.582.58x
n
2.58x
n
+
Location of
8/7/2019 Chap2 Sampling Distns I
15/20
15
Example 2.5
Suppose that we wish to estimate the mean daily yield of a chemical manufactured
in a chemical plat. The daily yield, recorded for 50 days, produced a mean and
standard deviation of 871x = tons and 21 = tons. Find a 90% confidence
interval for the population mean.
Solution
A 90% CI for :2
x zn
where
2
0.1 1.645z = =
hence,21
871 1.64550
or 871 4.89
Interpretation:
i. Therefore, we estimate the mean daily yield to be nor more than 875.89tons and no less than 866.11 tons
ii. In repeated sampling, 90% of the confidence intervals similarly formed willenclosed the true value of
iii. Therefore, we estimate the mean daily yield lies in the interval from866.11 to 875.859 tons
8/7/2019 Chap2 Sampling Distns I
16/20
16
Confidence Interval for difference between two means
A( )1 100% confidence interval for ( )1 2 :
( ) 2
2 21 2
1 21 2
x x z n n
+
Note: If 21 and22 are unknown, but both 1n and 2n are greater than or equal to
30, you can use the sample variances 21s and22s to estimate
21 and
22 .
Example 2.6A comparison of wearing quality of two types of automobile tyres were obtained
by road-testing samples of 1 2 100n n= = tires for each type. The number of miles
until wear-out was recorded. Estimate the difference in mean miles to wear-out,
the standard error and find a 99% CI.
Tyre 1: 1 26,400x = miles and21 1,440,000s =
Tyre 2: 2 25,100x = and22 1,960,000s =
Solution
The point estimate of ( )1 2 is ( )1 2 26,400 25,100 1300x x = = miles
The standard error (s.e.) of ( )1 2x x is
2 2 2 21 2 1 2
1 2 1 2
1440000 1960000184.4
100 100
s s
n n n n
+ + = + =
A 99% CI for ( )1 2 : ( )2
2 2
1 21 2
1 2
x x zn n
+
8/7/2019 Chap2 Sampling Distns I
17/20
17
i.e.
( )( )1300 2.58 184.4 1300 475.752
Hence, we are 99% confident that mean difference in miles to wear-out is
estimated to lie between 824.2 and 1775.8
Confidence Interval for Proportion
A( )1 100% confidence interval for p :2
pqp z
n
Example 2.7
A random sample of 100 voters in a community produced 59x = voters
favouring candidate J. Find an estimate for population who favoured candidate J;
also a 95% confidence interval for the population proportion.
Solution
A point estimate for p is
59
0.59100
x
p n= = =
A 95% CI for p :2
pqp z
n
i.e. ( )0.59 1.96 0.049 or 0.59 0.09604
Therefore, in repeated sampling, 95% of the confidence interval calculated this
way will enclosed the true value of p
8/7/2019 Chap2 Sampling Distns I
18/20
18
Confidence Interval for Proportion Differences
A( )1 100% confidence interval for ( )1 2p p :
( )2
1 1 2 21 2
1 2
p q p qp p z
n n
+
Assumption: 1n and 2n must be sufficiently large so that the sampling
distribution of ( )1 2 p p can be approximated by a normal distribution.
Example 2.8
A manufacturer of fly spray wished to compare two new sprays 1 and 2. Two
rooms of equal size, each containing 1000 flies, were used in the experiment.
Room A was treated with fly spray1 and room B with spray2. A total of 825 and
760 flies succumbed to sprays 1 and 2 respectively. Estimate the difference in the
rate of kill for the two sprays and a 90% confidence interval.
Solution
The point estimate of ( )1 2p p : ( )1 2 0.825 0.76 0.065p p = =
The standard error:
( )( ) ( ) ( )1 1 2 2
1 2
0.825 0.175 0.76 0.24 0.017857
1000 1000
p q p q
n n+ = + =
A 90% CI for ( )1 2p p : ( )2
1 1 2 21 2
1 2
p q p qp p z
n n
+
i.e
( )0.065 1.645 0.017857 or 0.065 0.02934
Hence, we are 90% confident that the difference between the rates of kill lies
between 0.036 to 0.094 units.
8/7/2019 Chap2 Sampling Distns I
19/20
19
2.10 Sample size
How many observations should be included in the sample?Unfortunately, we
cannot answer this question without knowing how much information the experimenter wishes to
buy.
Suppose we wish to estimate the mean daily yield and we would like the error of
estimation to be less than 4 tons with a probability of 0.95.
Now 95% of the sample means will lie within 1.96 x of in repeated sampling;
hence, we are asking that 1.96 x equal 4 tons.
Thus,
1.96 4x = i.e. 1.96 4n
=
Solving for n , we obtain
2
21.96
4n
=
or 20.24n =
We assume that n is very large and s
Thus, ( )22
0.24 0.24 21 105.9n = = = Hence, a sample size of 106
Procedure
Let be the parameter to be estimated and let
be the standard deviation of
the point estimator. Then proceed as follows:
1) Choose B , the bound on the error of estimation, and a confidencecoefficient ( )1
2) Assume that n is large; solve the following equation for the sample size :n
2
z B
=
where2
z is the value of z having 2 to its right
8/7/2019 Chap2 Sampling Distns I
20/20
2.11 The p-value
The smallest value of for which the test results are statistically significant is
often called the p-value or the observed significance level.
More formally, the p-value (probability value) is the probability of obtaining a
result at least as extreme as the one that was observed assuming that 0H is true.
Example