+ All Categories
Home > Documents > Chap2 Sampling Distns I

Chap2 Sampling Distns I

Date post: 08-Apr-2018
Category:
Upload: chris-topher
View: 223 times
Download: 0 times
Share this document with a friend

of 20

Transcript
  • 8/7/2019 Chap2 Sampling Distns I

    1/20

    1

    2. SAMPLING DISTRIBUTION

    2.1 Preliminaries

    Numerical descriptive measures computed from the population measurementare called parameters. A statistic is a quantity calculated from theobservations in a sample

    Population mean: and Sample mean: x

    Population variance:( )

    2

    2 1

    N

    i

    i

    x

    N

    =

    =

    Sample variance:( )

    2

    2 1

    1

    n

    i

    i

    x x

    sn

    =

    =

    The standard error of a statistic is the standard deviation of the samplingdistribution of that statistic

    2.1.1 Introduction

    In M25A, you were introduced to some useful random variables and their

    probability distributions. In practical sampling situations, we select a sample of n

    observations and use these measurements to calculate statistics such as the sample

    mean and variance. These statistics are used to make inferences about the

    corresponding parameters in the sampled population. Since, the value of a statisticdepends upon the observed values in the sample, a statistics is itself a random

    variable that may be discrete or continuous. The probability distribution of a

    statistic is called its sampling distribution, since it describes the behaviour of the

    statistic in repeated sampling.

    The sampling distribution of a statistic is the probability distribution for the

    values of the statistic that results when random samples of size n are repeatedly

    drawn from the population.

    The sampling distribution may be derived mathematically or approximated

    empirically. Empirical approximations are found by drawing a large number of

    samples of size n from the specified population, calculate the value of the

    statistics for each sample and tabulate the results in a relative frequency histogram.

    When the number of samples is large, the relative frequency histogram should

    closely approximate the theoretical sampling distribution.

  • 8/7/2019 Chap2 Sampling Distns I

    2/20

    2

    2.2 Sampling Distributions

    Consider a random sample of size 3n = drawn with replacement from a

    population of 5N = elements. As simple random sample of size n is selected in

    such a way that every sample of size n has the same probability of being selected

    equal to ( )1N

    nC, where N nC is the number of sample.

    Suppose we have a population of 5N = elements whose values are 3,6,9,12 and

    15. There are five distinct elements, the population probability distribution:

    ( ) 15p x = for 3,6,9,12,15x =

    Sample Sample values x m

    1 3, 6, 9 6 6

    2 3, 6, 12 7 6

    3 3, 6, 15 8 6

    4 3, 9, 12 8 9

    5 3, 9, 15 9 9

    6 3, 12, 15 10 12

    7 6, 9, 12 9 9

    8 6, 9, 15 10 9

    9 6, 12, 15 11 12

    10 9, 12, 15 12 12

    The table above shows that the values of x and m (median) associated with each

    sample are each assigned probability equal to 110

    . So, we will observe a value of

    6x = only if sample 1 is selected and this occurs with probability 0.1 A value of

    8x = will occur if sample 3 or sample 4 is drawn; therefore, probability of

    observe 8x = is 0.2.

    Hence, the sampling distributions for x is shown below.

    x ( )p x

    6 0.1

    7 0.1

    8 0.2

    9 0.2

  • 8/7/2019 Chap2 Sampling Distns I

    3/20

    3

    10 0.2

    11 0.1

    12 0.1

    2.3 Central Limit Theorem

    If random samples of nobservations are drawn from a non-normal population

    with finite mean and standard deviation ; then when n is large, the sampling

    distribution of the sample mean x is approximately normally distributed, with

    mean and standard deviation:

    x = and xn

    =

    Diagrams

  • 8/7/2019 Chap2 Sampling Distns I

    4/20

    4

    2.4 Sampling distribution: Sample Mean

    The standard deviation of a statistic used as an estimator of a population parameter is often

    called thestandard error of the estimator, since it refers to the precision of the estimator.

    Thus, the standard deviation of x is referred to as thestandard error (s.e.) of the mean.

    Example 2.1

    Suppose that you select a random sample of 25n = observations from a

    population with mean 8 = and 0.6 = . Find the probability that the sample

    mean x will

    a) be less than 7.9 b) exceeds 7.9 c) lie within 0.1 of the 8 =

    Solution

    a) since 25n = , is relatively large, then the sampling distribution of x is

    approximately normally distributed due to CLT

    Now,0.6

    0.1225

    xn

    = = =

    ( )7.9 8.0

    7.90.12x

    xP x P

    < =

  • 8/7/2019 Chap2 Sampling Distns I

    5/20

    5

    b) ( ) ( )7.9 0.83 0.7967P x P Z > = > =

    c) ( ) ( )7.9 8.1 0.83 0.83 0.7967 0.2033 0.5934P x P Z < < = < < = =

    2.5 Sampling distribution: Sample Proportion

    Consider a sampling problem involving consumer preference or opinion poll; we

    are concerned with estimating the proportion p of the people in the population

    who possess some specific characteristic. These are practical examples of

    binomial experiments, if the sampling procedure has been conducted in the appropriate

    manner.

    (i) If a random sample of nobservations is selected from a binomialpopulation with parameter p , then sampling distribution of thesample proportion is given by:

    x

    pn

    = will have: p p = and ppq

    n =

    The probability that

    lies within 0.1 of

  • 8/7/2019 Chap2 Sampling Distns I

    6/20

    6

    (ii) When the sample size is large, the sampling distribution of p can beapproximated by a normal distribution. The approximation will be

    adequate if 2p p and 2p p + fall in the interval 0 to 1.

    (iii) A rule of thumb for the approximation to be satisfactory is that 5np > and 5npq >

    (iv) We us this normal approximation to evaluate the probability that thebinominal variable Y is less than or greater than a particular value y . This

    y is an integer, so we must take account that we are approximating a

    discrete random variable Y by a continuous random variable X . So, we

    think of the probability mass corresponding to value y as being spread

    over the interval, ( )1 12 2,y y +

    Hence, using a continuity correction:

    ( ) ( )12P Y y P X y + i.e. adding a half

    ( ) ( )12P Y y P X y i.e. subtracting a half

    When X is continuous, ( ) 0P X x= = for any x , we can specify probabilities

    in intervals only, not at points. However, using a normal approximation we

    can specify ( )P Y y= is equal to ( )1 12 2P y X y + . So, ( )P Y y and

    ( )P Y y> are the not the same, they differ by an amount equal to ( )P Y y=

    Hence, using continuity correction:

    ( ) ( )12P Y y P X y<

    ( ) ( )12P Y y P X y> +

  • 8/7/2019 Chap2 Sampling Distns I

    7/20

    7

    Example 2.2

    A survey of 313 children, ages 14 to 22, selected from the nations top corporate

    executives; when asked to identify the best aspect of being privileged in this

    group, 55% mentioned material and financial gains.a) describe the sampling distribution of the sample proportion

    b) assume that the population proportion is 0.5; what is the probability of

    observing a sample proportion as large or larger than p ?

    Solution

    a) since the sample size is large, then the distribution of p is normally distributed

    with mean 0.55p = and 0.55 0.45

    0.028313p

    pq pq

    n n

    = = =

    therefore, we know that approximately 95% of the time p will fall within

    2 0.056p of the unknown value of p .

    One could check the condition that allows for normal approximation to the distribution of p ; ie.

    2 0.55 0.056pp = or 0.494 to 0.606, which falls in the interval 0 to 1

    b) we are given that 0.5p p = = and 0.5 0.5

    0.0283313

    p

    pq

    n

    = = =

    ( )0.55 0.5

    0.550.0283

    P p P Z

    =

    ( )1.77 0.0384P Z =

    The sampling distribution of

    based on a sample of

    children

  • 8/7/2019 Chap2 Sampling Distns I

    8/20

    8

    This tells us that if we were to select a random sample of 313n = observations from a

    population with proportion 0.5p = , the probability that the sample proportion p would be as

    large or larger than 0.55 is only 4%.

    Alternatively: using the correction of continuity, the equivalent to 0.5 would be 12n

    ,

    So;

    ( )( )1

    0.55 0.0016 0.51.71 0.0436

    0.0283P Z P Z

    = =

    When n is large, the effect of using the correction is generally negligible.

    2.6 Sampling distribution: Sum or Difference between two sample mean

    When independent random samples of size 1n and 2n observations have been

    selected from population with means 1 and 2 , and variances21 and

    22

    respectively; the sampling distribution of the sum or differences will have the

    following properties:

    (a)The mean and standard deviation of ( )1 2x x :( )1 2 1 2x x

    = and ( )1 2

    2 21 2

    1 2

    x xn n

    = +

    (b)If the sampled populations are normally distributed, then the samplingdistribution is exactly normally distributed regardless of the sample size

    (c)If the sampled populations are not normally distributed, then the samplingdistribution is approximately normally distributed when the sample size are

    large due to the CLT

  • 8/7/2019 Chap2 Sampling Distns I

    9/20

    9

    Example 2.3

    A random sample of 40 teachers were selected from high schools in Kingston

    and in St Ann. What is the probability that the sample mean salary from Kingston

    will exceed the sample mean salary from St Ann by $1500 or more? Given that

    Kingston mean salary is $29,000 and St Ann mean salary is $28,621 and standarddeviations for two population salary are $5000 and $4700 respectively.

    Solution

    Let 1x be the mean salary for Kgn and 2x be the mean salary for St Ann; also,2

    1

    and 22 be standard deviation respectively.

    Given that: 1 29,000x = , 2 28,621x = and2

    1 5000 = ,2

    2 4700 =

    then

    ( )1 2 1 2 29,000 28,621 379x x = = =

    ( )1 2

    2 2 2 2

    1 2

    1 2

    5000 47001085.0115

    40 40x x

    n n

    = + = + =

    Since sample size is large, then we can use the normal approximation

    ( )1 21500 379

    1500 1085.0115P x x P Z

    > = >

    ( ) ( )1.03 1 1.03 1 0.8485 0.1515P Z > = = =

    The sampling distribution

    of

  • 8/7/2019 Chap2 Sampling Distns I

    10/20

    10

    2.7 Sampling distribution: Difference between two sample proportions

    Assume that independent random samples of size 1n and 2n observations have

    been selected from binomial populations with parameters 1p and 2p respectively.

    Then the sampling distributions of difference between sample proportions

    ( ) 1 21 21 2

    x x

    p pn n

    =

    will have the following properties:

    (a)The mean and standard deviation of ( )1 2 p p :( )1 2 1 2 p p

    p p

    = and ( )1 21 1 2 2

    1 2

    p p

    p q p q

    n n

    = +

    (b)The sampling distribution of ( )1 2 p p can be approximated by a normaldistribution when both sample sizes are large due to CLT

    (c)When we use a normal distribution to approximate binomial probabilities,the interval ( ) ( )1 21 2 2 p pp p should varies from 1 to 1.

    Example 2.4A local newspaper reported that 75% of the residents in the developing section

    and 60% of the residents in other parts of the city favour passage of a proposed

    bond issue to build a new school. Random samples of 1 50n = residents in

    developing section of the city and 2 100n = residents in other parts of the city are

    selected, and the residents in the sample are asked whether they favour the bond

    proposal. What is the probability that the difference in magnitude between the

    sample proportions favouring the bond proposal does not exceed 10%.

    Solution

    Let us assume that 1 0.75p = and 2 0.60p = , and, the sampling distributions of

    the difference between proportions to be approximately normally distributed.

  • 8/7/2019 Chap2 Sampling Distns I

    11/20

    11

    So, ( ) ( )1 2 1 2 0.75 0.60 0.15p p p p = = =

    and

    ( )1 21 1 2 2

    1 2

    0.75 0.25 0.6 .40.0784

    50 100p p

    p q p q

    n n

    = + = + =

    We wish to find ( )1 2 0.1 0.1P p p < <

    Hence,

    ( )1 20.1 0.15 0.1 0.15

    0.1 0.10.0784 0.0784

    P p p P Z

    < < = <

  • 8/7/2019 Chap2 Sampling Distns I

    12/20

    12

    2.8 LargeSample Estimation

    Since populations are characterised by numerical descriptive measures called

    parameters, statistical inference is concerned with making inferences about

    population parameters. Methods for making inferences about parameters fall into

    one of two categories. We may make decisions concerning the value of the

    parameter, or we may estimate or predict the value of the parameter. Which

    method of inference should be used; that is, should the parameter be estimated or

    should we test a hypothesis concerning its value?

    Estimation procedures can be divided into two types, point estimation and

    interval estimation.

    An estimator is a statistic used to estimate a population parameter; it is a function

    of the sample observations

    An estimate is the value an estimator takes for a particular sample. Also, called a

    point estimate.

    An interval estimator of a population parameter tells us how to calculate two

    numbers based on sample data, forming an interval within which the parameter is

    expected to lie. This pair of numbers is called an interval estimateor confidence

    interval.

    Suppose we let denote an estimator of the population parameter ( , , or

    any parameter). We would like our estimator to be unbiased and the spread of

    the sampling distribution of the estimator be as small as possible.

    The distance between the estimate and the parameter, called the error of

    estimation

    The probability that a confidence interval will enclose the estimated parameter is

    called the confidence coefficient

    A good confidence interval is one that is narrow as possible and has a large

    confidence coefficient, near 1. The narrower the interval, the more exactly we

    have located the estimated parameter.

  • 8/7/2019 Chap2 Sampling Distns I

    13/20

    13

    Suppose we want to estimate the mean number of bacteria per cubic centimetre in

    a polluted stream. If we draw 10 samples, each containing 30n = observations;

    Construct, a confidence interval for the population mean for each sample, the

    intervals might appear as shown in diagram below.

    The horizontal line segments represent the ten intervals and the vertical line

    represents the location of the true mean number of bacteria per cubic cm. The

    parameter is fixed and that the interval location and width may vary from sample

    to sample. Thus, we speak of the probability that the interval encloses , not

    the probability that falls in the interval, because is fixed. The interval israndom.

    A( )1 100% confidence interval for :2

    z

    where2

    z is the z value corresponding to an area 2 in the upper tail of a

    standard normal distribution.

    Also,2

    z

    is the bound on the error of estimation

    2

    z + is called the upper confidence limit and2

    z is called the lower

    confidence limit

    Ten confidence intervals for the

    mean number of bacteria per cubic

    cm each based on a sample of

    observations

  • 8/7/2019 Chap2 Sampling Distns I

    14/20

    14

    2.9 Confidence Interval (CI) for Population Mean

    A( )1 100% confidence interval for :2

    x zn

    Note: If is unknown, it can be approximated by the sample standard deviationwhen the sample size is large.

    Remark:If you want a confidence coefficient ( )1 equal to 0.95, then the tail-

    end area is 0.05 = and half of is placed in each tail of the distribution. So of

    the commonly used confidence coefficients are shown in the table below.

    Confidence coefficient

    ( )1 2

    z LCL UCL

    0.90 0.1 1.6451.645x

    n

    1.645x

    n

    +

    0.95 0.05 1.961.96x

    n

    1.96x

    n

    +

    0.99 0.01 2.582.58x

    n

    2.58x

    n

    +

    Location of

  • 8/7/2019 Chap2 Sampling Distns I

    15/20

    15

    Example 2.5

    Suppose that we wish to estimate the mean daily yield of a chemical manufactured

    in a chemical plat. The daily yield, recorded for 50 days, produced a mean and

    standard deviation of 871x = tons and 21 = tons. Find a 90% confidence

    interval for the population mean.

    Solution

    A 90% CI for :2

    x zn

    where

    2

    0.1 1.645z = =

    hence,21

    871 1.64550

    or 871 4.89

    Interpretation:

    i. Therefore, we estimate the mean daily yield to be nor more than 875.89tons and no less than 866.11 tons

    ii. In repeated sampling, 90% of the confidence intervals similarly formed willenclosed the true value of

    iii. Therefore, we estimate the mean daily yield lies in the interval from866.11 to 875.859 tons

  • 8/7/2019 Chap2 Sampling Distns I

    16/20

    16

    Confidence Interval for difference between two means

    A( )1 100% confidence interval for ( )1 2 :

    ( ) 2

    2 21 2

    1 21 2

    x x z n n

    +

    Note: If 21 and22 are unknown, but both 1n and 2n are greater than or equal to

    30, you can use the sample variances 21s and22s to estimate

    21 and

    22 .

    Example 2.6A comparison of wearing quality of two types of automobile tyres were obtained

    by road-testing samples of 1 2 100n n= = tires for each type. The number of miles

    until wear-out was recorded. Estimate the difference in mean miles to wear-out,

    the standard error and find a 99% CI.

    Tyre 1: 1 26,400x = miles and21 1,440,000s =

    Tyre 2: 2 25,100x = and22 1,960,000s =

    Solution

    The point estimate of ( )1 2 is ( )1 2 26,400 25,100 1300x x = = miles

    The standard error (s.e.) of ( )1 2x x is

    2 2 2 21 2 1 2

    1 2 1 2

    1440000 1960000184.4

    100 100

    s s

    n n n n

    + + = + =

    A 99% CI for ( )1 2 : ( )2

    2 2

    1 21 2

    1 2

    x x zn n

    +

  • 8/7/2019 Chap2 Sampling Distns I

    17/20

    17

    i.e.

    ( )( )1300 2.58 184.4 1300 475.752

    Hence, we are 99% confident that mean difference in miles to wear-out is

    estimated to lie between 824.2 and 1775.8

    Confidence Interval for Proportion

    A( )1 100% confidence interval for p :2

    pqp z

    n

    Example 2.7

    A random sample of 100 voters in a community produced 59x = voters

    favouring candidate J. Find an estimate for population who favoured candidate J;

    also a 95% confidence interval for the population proportion.

    Solution

    A point estimate for p is

    59

    0.59100

    x

    p n= = =

    A 95% CI for p :2

    pqp z

    n

    i.e. ( )0.59 1.96 0.049 or 0.59 0.09604

    Therefore, in repeated sampling, 95% of the confidence interval calculated this

    way will enclosed the true value of p

  • 8/7/2019 Chap2 Sampling Distns I

    18/20

    18

    Confidence Interval for Proportion Differences

    A( )1 100% confidence interval for ( )1 2p p :

    ( )2

    1 1 2 21 2

    1 2

    p q p qp p z

    n n

    +

    Assumption: 1n and 2n must be sufficiently large so that the sampling

    distribution of ( )1 2 p p can be approximated by a normal distribution.

    Example 2.8

    A manufacturer of fly spray wished to compare two new sprays 1 and 2. Two

    rooms of equal size, each containing 1000 flies, were used in the experiment.

    Room A was treated with fly spray1 and room B with spray2. A total of 825 and

    760 flies succumbed to sprays 1 and 2 respectively. Estimate the difference in the

    rate of kill for the two sprays and a 90% confidence interval.

    Solution

    The point estimate of ( )1 2p p : ( )1 2 0.825 0.76 0.065p p = =

    The standard error:

    ( )( ) ( ) ( )1 1 2 2

    1 2

    0.825 0.175 0.76 0.24 0.017857

    1000 1000

    p q p q

    n n+ = + =

    A 90% CI for ( )1 2p p : ( )2

    1 1 2 21 2

    1 2

    p q p qp p z

    n n

    +

    i.e

    ( )0.065 1.645 0.017857 or 0.065 0.02934

    Hence, we are 90% confident that the difference between the rates of kill lies

    between 0.036 to 0.094 units.

  • 8/7/2019 Chap2 Sampling Distns I

    19/20

    19

    2.10 Sample size

    How many observations should be included in the sample?Unfortunately, we

    cannot answer this question without knowing how much information the experimenter wishes to

    buy.

    Suppose we wish to estimate the mean daily yield and we would like the error of

    estimation to be less than 4 tons with a probability of 0.95.

    Now 95% of the sample means will lie within 1.96 x of in repeated sampling;

    hence, we are asking that 1.96 x equal 4 tons.

    Thus,

    1.96 4x = i.e. 1.96 4n

    =

    Solving for n , we obtain

    2

    21.96

    4n

    =

    or 20.24n =

    We assume that n is very large and s

    Thus, ( )22

    0.24 0.24 21 105.9n = = = Hence, a sample size of 106

    Procedure

    Let be the parameter to be estimated and let

    be the standard deviation of

    the point estimator. Then proceed as follows:

    1) Choose B , the bound on the error of estimation, and a confidencecoefficient ( )1

    2) Assume that n is large; solve the following equation for the sample size :n

    2

    z B

    =

    where2

    z is the value of z having 2 to its right

  • 8/7/2019 Chap2 Sampling Distns I

    20/20

    2.11 The p-value

    The smallest value of for which the test results are statistically significant is

    often called the p-value or the observed significance level.

    More formally, the p-value (probability value) is the probability of obtaining a

    result at least as extreme as the one that was observed assuming that 0H is true.

    Example


Recommended